OVERVIEW

The HTML-Parser distribution is is a collection of modules that parse and extract information from HTML documents. The modules present in this collection are:

HTML::Parser - The parser base class. It receives arbitrary sized

        chunks of the HTML text, recognizes markup elements, and
        separates them from the plain text.  As different kinds of markup
        and text are recognized, the corresponding event handlers are
        invoked.

HTML::Entities - Provides functions to encode and decode text with

embedded HTML <entities>.

HTML::HeadParser - A lightweight HTML::Parser subclass that extracts

information from the <HEAD> section of an HTML document.

HTML::LinkExtor - An HTML::Parser subclass that extracts links from

an HTML document.

HTML::PullParser - An alternative interface to the basic parser

that does not require event driven programming.

HTML::TokeParser - An HTML::PullParser subclass with fixed

        token setup and methods for extracting text.  Many simple
        parsing needs are probably best attacked with this module.

In addition take a look at the HTML-Tree package that build on HTML::Parser to create and extract information from HTML syntax trees (similar to HTML DOM).

PREREQUISITES

In order to install and use this package you will need Perl version 5.6 or better. The HTML::Tagset module should be installed.

If you intend to use the HTML::HeadParser you probably want to install libwww-perl too.

INSTALLATION

Just follow the usual procedure:

perl Makefile.PL
make
make test
make install

REPORTING BUGS

Bug reports and issues for discussion about these modules can be sent to the <libwww@perl.org> mailing list.

COPYRIGHT

© 1995-2009 Gisle Aas. All rights reserved. © 1999-2000 Michael A. Chase. All rights reserved.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.