HTML::Summary - module for generating a summary from a web page.


HTML-Summary documentation  | view source Contained in the HTML-Summary distribution.

Index


NAME

Top

HTML::Summary - module for generating a summary from a web page.

SYNOPSIS

Top

    use HTML::Summary;
    use HTML::TreeBuilder;

    my $tree = new HTML::TreeBuilder;
    $tree->parse( $document );

    my $summarizer = new HTML::Summary(
        LENGTH      => 200,
        USE_META    => 1,
    );

    $summary = $summarizer->generate( $tree );
    $summarizer->option( 'USE_META' => 1 );
    $length = $summarizer->option( 'LENGTH' );
    if ( $summarizer->meta_used( ) )
    {
        do something
    }

DESCRIPTION

Top

The HTML::Summary module produces summaries from the textual content of web pages. It does so using the location heuristic, which determines the value of a given sentence based on its position and status within the document; for example, headings, section titles and opening paragraph sentences may be favoured over other textual content. A LENGTH option can be used to restrict the length of the summary produced.

CONSTRUCTOR

Top

new( $attr1 => $value1 [, $attr2 => $value2 ] )

Possible attributes are:

VERBOSE

Generate verbose messages to STDERR.

LENGTH

Maximum length of summary (in bytes). Default is 500.

USE_META

Flag to tell summarizer whether to use the content of the <META> tag in the page header, if one is present, instead of generating a summary from the body text. Note that if the USE_META flag is set, this overrides the LENGTH flag - in other words, the summary provided by the <META> tag is returned in full, even if it is greater than LENGTH bytes. Default is 0 (no).

    my $summarizer = new HTML::Summary LENGTH => 200;

METHODS

Top

option( )

Get / set HTML::Summary configuration options.

    my $length = $summarizer->option( 'LENGTH' );
    $summarizer->option( 'USE_META' => 1 );

generate( $tree )

Takes an HTML::Element object, and generates a summary from it.

    my $tree = new HTML::TreeBuilder;
    $tree->parse( $document );
    my $summary = $summarizer->generate( $tree );

meta_used( )

Returns 1 if the META tag description was used to generate the summary.

    if ( $summarizer->meta_used() )
    {
        # do something ...
    }

SEE ALSO

Top

    HTML::TreeBuilder
    Text::Sentence
    Lingua::JA::Jcode
    Lingua::JA::Jtruncate

AUTHORS

Top

    Ave Wrigley <wrigley@cre.canon.co.uk>
    Tony Rose <tgr@cre.canon.co.uk>
    Neil Bowers <neilb@cre.canon.co.uk>

COPYRIGHT

Top


HTML-Summary documentation  | view source Contained in the HTML-Summary distribution.