HTML::Extract - Perl extension for getting text and HTML snippets out of HTML pages in general.


HTML-Extract documentation  | view source Contained in the HTML-Extract distribution.

Index


NAME

Top

HTML::Extract - Perl extension for getting text and HTML snippets out of HTML pages in general.

SYNOPSIS

Top

  use HTML::Extract;
  my $extractor=new HTML::Extract;
  # return a text version of the content
  print $extractor->gethtml(http://uri/,tagname=body,returntype=text);

  


DESCRIPTION

Top

This is a pretty simple little Perl module for getting text out of HTML pages. It's really designed so that you can call it in anything where you would otherwise be looking for a way of stripping part of web pages away (for example, if you are extracting some pieces of text with the intent of placing it elsewhere). It also comes with a little demonstration program that shows how it can be wrapped as a command line program...

EXPORT

None.

SEE ALSO

Top

Obviously this makes use of quite a few other modules to do what it does; HTML::Element, HTML::TreeBuilder, HTML::TagFilter, LWP::UserAgent, LWP::Simple.

AUTHOR

Top

Emma Tonkin, < cselt@users.sourceforge.net >

COPYRIGHT AND LICENSE

Top


HTML-Extract documentation  | view source Contained in the HTML-Extract distribution.