SWISH::Prog::Aggregator::Spider::UA - spider user agent


SWISH-Prog documentation Contained in the SWISH-Prog distribution.

Index


Code Index:

NAME

Top

SWISH::Prog::Aggregator::Spider::UA - spider user agent

SYNOPSIS

Top

 use SWISH::Prog::Aggregator::Spider::UA;
 my $ua = SWISH::Prog::Aggregator::Spider::UA->new;

 # $ua is a WWW::Mechanize object

DESCRIPTION

Top

SWISH::Prog::Aggregator::Spider::UA is a subclass of WWW::Mechanize.

METHODS

Top

get( uri, delay )

sleep() delay seconds before fetching uri.

title

Returns document title. Overrides base method to verify that UTF-8 flag is set correctly on the response content.

AUTHOR

Top

Peter Karman, <perl@peknet.com>

BUGS

Top

Please report any bugs or feature requests to bug-swish-prog at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=SWISH-Prog. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc SWISH::Prog




You can also look for information at:

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=SWISH-Prog

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/SWISH-Prog

* CPAN Ratings

http://cpanratings.perl.org/d/SWISH-Prog

* Search CPAN

http://search.cpan.org/dist/SWISH-Prog/

COPYRIGHT AND LICENSE

Top

SEE ALSO

Top

http://swish-e.org/


SWISH-Prog documentation Contained in the SWISH-Prog distribution.
package SWISH::Prog::Aggregator::Spider::UA;
use strict;
use warnings;
use base qw( WWW::Mechanize );
use Carp;
use Data::Dump qw( dump );
use Search::Tools::UTF8;

sub get {
    my $self  = shift;
    my $uri   = shift or croak "URI required";
    my $delay = shift;
    if ($delay) {
        sleep($delay);
    }
    return $self->SUPER::get($uri);
}

sub title {
    my $self = shift;
    return unless $self->is_html;

    require HTML::HeadParser;
    my $p = HTML::HeadParser->new;

    # the standard title() method does not check to see if utf-8 is
    # flagged as such by perl, and so HTML::HeadParser throws warning.
    # So we trust the content-type header and
    # verify that the utf-8 flag is on.
    if ( $self->response->header('content-type') =~ m/utf-8/i ) {
        $p->parse( to_utf8( $self->content ) );
    }
    else {
        $p->parse( $self->content );
    }
    return $p->header('Title');
}

1;

__END__