| HTML-Summary documentation | view source | Contained in the HTML-Summary distribution. |
HTML::Summary - module for generating a summary from a web page.
use HTML::Summary;
use HTML::TreeBuilder;
my $tree = new HTML::TreeBuilder;
$tree->parse( $document );
my $summarizer = new HTML::Summary(
LENGTH => 200,
USE_META => 1,
);
$summary = $summarizer->generate( $tree );
$summarizer->option( 'USE_META' => 1 );
$length = $summarizer->option( 'LENGTH' );
if ( $summarizer->meta_used( ) )
{
do something
}
The HTML::Summary module produces summaries from the textual content of
web pages. It does so using the location heuristic, which determines the value
of a given sentence based on its position and status within the document; for
example, headings, section titles and opening paragraph sentences may be
favoured over other textual content. A LENGTH option can be used to restrict
the length of the summary produced.
Possible attributes are:
Generate verbose messages to STDERR.
Maximum length of summary (in bytes). Default is 500.
Flag to tell summarizer whether to use the content of the <META> tag
in the page header, if one is present, instead of generating a summary from the
body text. Note that if the USE_META flag is set, this overrides the LENGTH
flag - in other words, the summary provided by the <META> tag is
returned in full, even if it is greater than LENGTH bytes. Default is 0 (no).
my $summarizer = new HTML::Summary LENGTH => 200;
Get / set HTML::Summary configuration options.
my $length = $summarizer->option( 'LENGTH' );
$summarizer->option( 'USE_META' => 1 );
Takes an HTML::Element object, and generates a summary from it.
my $tree = new HTML::TreeBuilder;
$tree->parse( $document );
my $summary = $summarizer->generate( $tree );
Returns 1 if the META tag description was used to generate the summary.
if ( $summarizer->meta_used() )
{
# do something ...
}
HTML::TreeBuilder
Text::Sentence
Lingua::JA::Jcode
Lingua::JA::Jtruncate
Ave Wrigley <wrigley@cre.canon.co.uk>
Tony Rose <tgr@cre.canon.co.uk>
Neil Bowers <neilb@cre.canon.co.uk>
Copyright (c) 1997 Canon Research Centre Europe (CRE). All rights reserved. This script and any associated documentation or files cannot be distributed outside of CRE without express prior permission from CRE.
| HTML-Summary documentation | view source | Contained in the HTML-Summary distribution. |