| SWISH-Prog documentation | Contained in the SWISH-Prog distribution. |
SWISH::Prog::Aggregator::Spider::UA - spider user agent
use SWISH::Prog::Aggregator::Spider::UA; my $ua = SWISH::Prog::Aggregator::Spider::UA->new; # $ua is a WWW::Mechanize object
SWISH::Prog::Aggregator::Spider::UA is a subclass of WWW::Mechanize.
sleep() delay seconds before fetching uri.
Returns document title. Overrides base method to verify that UTF-8 flag is set correctly on the response content.
Peter Karman, <perl@peknet.com>
Please report any bugs or feature requests to bug-swish-prog at rt.cpan.org, or through
the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=SWISH-Prog.
I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc SWISH::Prog
You can also look for information at:
Copyright 2008-2009 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| SWISH-Prog documentation | Contained in the SWISH-Prog distribution. |
package SWISH::Prog::Aggregator::Spider::UA; use strict; use warnings; use base qw( WWW::Mechanize ); use Carp; use Data::Dump qw( dump ); use Search::Tools::UTF8;
sub get { my $self = shift; my $uri = shift or croak "URI required"; my $delay = shift; if ($delay) { sleep($delay); } return $self->SUPER::get($uri); }
sub title { my $self = shift; return unless $self->is_html; require HTML::HeadParser; my $p = HTML::HeadParser->new; # the standard title() method does not check to see if utf-8 is # flagged as such by perl, and so HTML::HeadParser throws warning. # So we trust the content-type header and # verify that the utf-8 flag is on. if ( $self->response->header('content-type') =~ m/utf-8/i ) { $p->parse( to_utf8( $self->content ) ); } else { $p->parse( $self->content ); } return $p->header('Title'); } 1; __END__