Web::Scraper::LibXML - Drop-in replacement for Web::Scraper to use LibXML


Web-Scraper documentation Contained in the Web-Scraper distribution.

Index


Code Index:

NAME

Top

Web::Scraper::LibXML - Drop-in replacement for Web::Scraper to use LibXML

SYNOPSIS

Top

  use Web::Scraper::LibXML;

  # same as Web::Scraper
  my $scraper = scraper { ... };

DESCRIPTION

Top

Web::Scraper::LibXML is a drop-in replacement for Web::Scraper to use the fast libxml-based HTML tree builder, HTML::TreeBuilder::LibXML.

This is almost identical to HTML::TreeBuilder::LibXML's replace_original installer, like:

  use HTML::TreeBuilder::LibXML;
  HTML::TreeBuilder::LibXML->replace_original();

  use Web::Scraper;
  my $scraper = scraper { ... };
  # this code uses LibXML parser

which overrides HTML::TreeBuilder::XPath's new() constructor so that ALL of your code using HTML::TreeBuilder::XPath is switched to the libxml based parser.

This module, instead, gives you more control over which TreeBuilder to use, depending on the site etc.

SEE ALSO

Top

Web::Scraper HTML::TreeBuilder::LibXML


Web-Scraper documentation Contained in the Web-Scraper distribution.

package Web::Scraper::LibXML;
use strict;
use base qw( Web::Scraper );

use HTML::TreeBuilder::LibXML;

sub build_tree {
    my($self, $html) = @_;

    my $t = HTML::TreeBuilder::LibXML->new;
    $t->parse($html);
    $t->eof;
    $t;
}

1;

__END__