Lingua::JA::Summarize::Extract - summary generator for Japanese


Lingua-JA-Summarize-Extract documentation Contained in the Lingua-JA-Summarize-Extract distribution.

Index


Code Index:

NAME

Top

Lingua::JA::Summarize::Extract - summary generator for Japanese

SYNOPSIS

Top

    use strict;
    use warnings;
    use utf8;
    use Lingua::JA::Summarize::Extract;

    my $text = '日本語の文章を適当に書く。';
    my $summary = Lingua::JA::Summarize::Extract->extract($text);

    print $summary->as_string;
    print "$summary";

    # cuts short to 20 length
    $summary->length(20);
    print "$summary";

    # mecab charset
    my $extractor = Lingua::JA::Summarize::Extract->new({ mecab_charset => 'utf8' });

DESCRIPTION

Top

Lingua::JA::Summarize::Extract is a summary generator for Japanese text. The extraction method can be changed with the plug-in mechanism.

METHODS

Top

new([options])

a object is made by using the options.

extract(text[, options])

text is summarized. blessed by using options if called direct. return to Lingua::JA::Summarize::Extract::ResultSet object.

OPTIONS

Top

the content of processing can be changed by passing the constructor the options.

plugins

the processing of split of word and line and the scoring etc. can be done by using another modules. please pass it by the ARRAY reference.

rate

the weight at scoring can be changed.

thing to refer to POD of each plugin when you want to examine other options.

THANKS TO

Top

Tatsuhiko Miyagawa

AUTHOR

Top

Kazuhiro Osawa <ko@yappo.ne.jp>

SEE ALSO

Top

http://gensen.dl.itc.u-tokyo.ac.jp/, http://www.ryo.com/getsen/

LICENSE

Top

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


Lingua-JA-Summarize-Extract documentation Contained in the Lingua-JA-Summarize-Extract distribution.

package Lingua::JA::Summarize::Extract;

use strict;
use base qw( Class::Accessor::Fast );
__PACKAGE__->mk_accessors(qw/ text rate /);

use Carp ();
use UNIVERSAL::require;

our $VERSION = '0.02';

use Lingua::JA::Summarize::Extract::ResultSet;

my %DefaultPlugins = (
    scoring  => 'Scoring::Base',
    parse    => 'Parser::Ngram',
    sentence => 'Sentence::Base',
);

sub new {
    my $class = shift;
    my $self = $class->SUPER::new(@_);

    for my $plugin (@{ $self->{plugins} }) {
        $self->add_plugin($plugin);
    }

    for my $method (keys %DefaultPlugins) {
        $self->add_plugin($DefaultPlugins{$method}) unless $self->can($method);
    }

    $self->{rate} ||= 1;

    $self;
}

sub add_plugin {
    my($self, $plugin) = @_;
    my $class = ref $self;

    my $package = ($plugin =~ /^\+(.+)$/) ? $1 :
        sprintf '%s::Plugin::%s', $class, $plugin;
    {
        no strict 'refs';
        $package->require or Carp::croak($@);
        unshift @{"$class\::ISA"}, $package;
    }
    $package->init($self);
}

sub extract {
    my($class, $text, @opt) = @_;
    my $self = ref $class ? $class : $class->new(@opt);

    utf8::decode($text);
    $self->text($text) if $text;

    Lingua::JA::Summarize::Extract::ResultSet->new({
        %{ $self },
        summary   => $self->summarize || [],
        sentences => $self->sentence || [],
    });
}

sub summarize {
    my($self, $text) = @_;
    $self->text($text) if $text;
    $self->scoring($self->parse);
}

1;

__END__