Lingua::EN::Segmenter::TextTiling - Segment text using the TextTiling method


Lingua-EN-Segmenter documentation  | view source Contained in the Lingua-EN-Segmenter distribution.

Index


NAME

Top

Lingua::EN::Segmenter::TextTiling - Segment text using the TextTiling method

SYNOPSIS

Top

  use Lingua::EN::Segmenter::TextTiling qw(segments);
  use lib '.';

  my $text = <<EOT;
  Lingua::EN::Segmenter is a useful module that allows text to be split up 
  into words, paragraphs, segments, and tiles.

  Paragraphs are by default indicated by blank lines. Known segment breaks are
  indicated by a line with only the word "segment_break" in it.

  The module detects paragraphs that are unrelated to each other by comparing 
  the number of words per-paragraph that are related. The algorithm is designed
  to work only on long segments. 

  SOUTH OF BAGHDAD, Iraq (CNN) -- Seven U.S. troops freed Sunday after being 
  held by Iraqi forces arrived by helicopter at a base south of Baghdad and were 
  transferred to a C-130 transport plane headed for Kuwait, CNN's Bob Franken 
  reported from the scene. 

  EOT

  my $num_segment_breaks = 1;
  my @segments = segments($num_segment_breaks,$text);
  print $segments[0]; # Prints the first three paragraphs of the above text
  print "\n----------SEGMENT_BREAK----------\n";
  print $segments[1]; # Prints the last paragraph of the above text

  # This module can also be used in an object-oriented fashion
  my $splitter = new Lingua::EN::Splitter;
  @words = $splitter->words($text);

DESCRIPTION

Top

See synopsis.

EXTENDING

Top

This module is designed to be easily extendable. Feel free to extend from this module when designing alternate methods for text segmentation.

AUTHORS

Top

David James <splice@cpan.org>

SEE ALSO

Top

Lingua::EN::Segmenter::Baseline, Lingua::EN::Segmenter::Evaluator, http://www.cs.toronto.edu/~james

LICENSE

Top

  Copyright (c) 2002 David James
  All rights reserved.
  This program is free software; you can redistribute it and/or
  modify it under the same terms as Perl itself.


Lingua-EN-Segmenter documentation  | view source Contained in the Lingua-EN-Segmenter distribution.