WWW::Pastebin::PastebinCom::Retrieve - retrieve pastes from http://pastebin.com/ website


WWW-Pastebin-PastebinCom-Retrieve documentation Contained in the WWW-Pastebin-PastebinCom-Retrieve distribution.

Index


Code Index:

NAME

Top

WWW::Pastebin::PastebinCom::Retrieve - retrieve pastes from http://pastebin.com/ website

SYNOPSIS

Top

    use strict;
    use warnings;

    use lib '../lib';
    use WWW::Pastebin::PastebinCom::Retrieve;

    die "Usage: perl retrieve.pl <paste_ID_or_URI>\n"
        unless @ARGV;

    my $Paste = shift;

    my $paster = WWW::Pastebin::PastebinCom::Retrieve->new;

    my $results_ref = $paster->retrieve( $Paste )
        or die $paster->error;

    printf "Paste content is:\n%s\nPasted by %s on %s\n",
            @$results_ref{ qw(content name posted_on) };

DESCRIPTION

Top

The module provides interface to retrieve pastes from http://pastebin.com/ website via Perl.

CONSTRUCTOR

Top

new

    my $paster = WWW::Pastebin::PastebinCom::Retrieve->new;

    my $paster = WWW::Pastebin::PastebinCom::Retrieve->new(
        timeout => 10,
    );

    my $paster = WWW::Pastebin::PastebinCom::Retrieve->new(
        ua => LWP::UserAgent->new(
            timeout => 10,
            agent   => 'PasterUA',
        ),
    );

Constructs and returns a brand new juicy WWW::Pastebin::PastebinCom::Retrieve object. Takes two arguments, both are optional. Possible arguments are as follows:

timeout

    ->new( timeout => 10 );

Optional. Specifies the timeout argument of LWP::UserAgent's constructor, which is used for retrieving. Defaults to: 30 seconds.

ua

    ->new( ua => LWP::UserAgent->new( agent => 'Foos!' ) );

Optional. If the timeout argument is not enough for your needs of mutilating the LWP::UserAgent object used for retrieving, feel free to specify the ua argument which takes an LWP::UserAgent object as a value. Note: the timeout argument to the constructor will not do anything if you specify the ua argument as well. Defaults to: plain boring default LWP::UserAgent object with timeout argument set to whatever WWW::Pastebin::PastebinCom::Retrieve's timeout argument is set to as well as agent argument is set to mimic Firefox.

METHODS

Top

retrieve

    my $results_ref = $paster->retrieve('http://pastebin.com/f525c4cec')
        or die $paster->error;

    my $results_ref = $paster->retrieve('f525c4cec')
        or die $paster->error;

Instructs the object to retrieve a paste specified in the argument. Takes one mandatory argument which can be either a full URI to the paste you want to retrieve or just its ID. On failure returns either undef or an empty list depending on the context and the reason for the error will be available via error() method. On success returns a hashref with the following keys/values:

    $VAR1 = {
        'lang' => 'Perl',
        'posted_on' => 'Sat 22 Mar 16:07',
        'content' => 'blah blah content of the paste',
        'name' => 'Zoffix'
    };

content

    { 'content' => 'blah blah content of the paste', }

The content key will contain the actual content of the paste. See also content() method which is overloaded for this class.

lang

    { 'lang' => 'Perl' }

The lang key will contain the (computer) language of the paste (as specified by the person who pasted it)

posted_on

    { 'posted_on' => 'Sat 22 Mar 16:07', }

The posted_on key will contain the date/time when the paste was created.

name

    { 'name' => 'Zoffix' }

The name key will contain the name of the person who created the paste.

error

    $paster->retrieve('http://pastebin.com/f525c4cec')
        or die $paster->error;

On failure retrieve() returns either undef or an empty list depending on the context and the reason for the error will be available via error() method. Takes no arguments, returns an error message explaining the failure.

id

    my $paste_id = $paster->id;

Must be called after a successful call to retrieve(). Takes no arguments, returns a paste ID number of the last retrieved paste irrelevant of whether an ID or a URI was given to retrieve()

uri

    my $paste_uri = $paster->uri;

Must be called after a successful call to retrieve(). Takes no arguments, returns a URI object with the URI pointing to the last retrieved paste irrelevant of whether an ID or a URI was given to retrieve()

results

    my $last_results_ref = $paster->results;

Must be called after a successful call to retrieve(). Takes no arguments, returns the exact same hashref the last call to retrieve() returned. See retrieve() method for more information.

content

    my $paste_content = $paster->content;

    print "Paste content is:\n$paster\n";

Must be called after a successful call to retrieve(). Takes no arguments, returns the actual content of the paste. Note: this method is overloaded for this module for interpolation. Thus you can simply interpolate the object in a string to get the contents of the paste.

ua

    my $old_LWP_UA_obj = $paster->ua;

    $paster->ua( LWP::UserAgent->new( timeout => 10, agent => 'foos' );

Returns a currently used LWP::UserAgent object used for retrieving pastes. Takes one optional argument which must be an LWP::UserAgent object, and the object you specify will be used in any subsequent calls to retrieve().

SEE ALSO

Top

LWP::UserAgent, URI

AUTHOR

Top

Zoffix Znet, <zoffix at cpan.org> (http://zoffix.com, http://haslayout.net)

BUGS

Top

Please report any bugs or feature requests to bug-www-pastebin-pastebincom-retrieve at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Pastebin-PastebinCom-Retrieve. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc WWW::Pastebin::PastebinCom::Retrieve

You can also look for information at:

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Pastebin-PastebinCom-Retrieve

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/WWW-Pastebin-PastebinCom-Retrieve

* CPAN Ratings

http://cpanratings.perl.org/d/WWW-Pastebin-PastebinCom-Retrieve

* Search CPAN

http://search.cpan.org/dist/WWW-Pastebin-PastebinCom-Retrieve

COPYRIGHT & LICENSE

Top


WWW-Pastebin-PastebinCom-Retrieve documentation Contained in the WWW-Pastebin-PastebinCom-Retrieve distribution.

package WWW::Pastebin::PastebinCom::Retrieve;

use warnings;
use strict;

our $VERSION = '0.002';

use URI;
use HTML::TokeParser::Simple;
use HTML::Entities;
use base 'WWW::Pastebin::Base::Retrieve';

sub retrieve {
    my $self = shift;
    my $id   = shift;

    $self->$_(undef) for qw(error uri id results);
    
    return $self->_set_error('Missing or empty paste ID or URI')
        unless defined $id and length $id;

    ( my $uri, $id ) = $self->_make_uri_and_id( $id, @_ )
        or return;

    $self->id( $id );
    $self->uri( $uri );

    my $ua = $self->ua;
    my $response = $ua->get( $uri );
    if (
        $response->is_success
        or $response->code == 404 # and just WHY they thought giving 404s
        # on existing pastes is such a great idea?
    ) {
        return $self->_get_was_successful( $response->content );
    }
    else {
        return $self->_set_error('Network error: ' . $response->status_line);
    }
}

sub _make_uri_and_id {
    my ( $self, $what ) = @_;

    my ( $private, $id ) = $what =~ m{
                (?:http://)?
                (?:www\.)?
                (.*?) # "private paste" subdomain
                pastebin\.com/
                (\w+) # paste ID
        }xi;

    $id = $what
        unless defined $id and length $id;

    $private = ''
        unless defined $private;

    return ( URI->new("http://${private}pastebin.com/$id"), $id );
}

sub _parse {
    my ( $self, $content ) = @_;

    # yes, they could've given 200s on existing pastes and 404s on
    # non-existant, but NO!! 404s for EVERYONE... yey \o/
    # that calls for urgent parsing of HTML with regexen WEEEEEEE
    $content =~ m|<TITLE>404 Not Found</TITLE>|
        and $content !~ /<div>/
        and return $self->_set_error('This paste does not seem to exist');

    my $parser = HTML::TokeParser::Simple->new( \$content );
    
    my ( %data, %nav );
    @nav{ qw(level  start  get_name_date  get_lang  get_content) }
    = (0) x 5;

    while ( my $t = $parser->get_token ) {
        if ( $t->is_start_tag('div')
            and defined $t->get_attr('id')
            and $t->get_attr('id') eq 'content'
        ) {
            @nav{ qw(level start) } = (1, 1);
        }
        elsif ( $nav{start} == 1 and $t->is_start_tag('h1') ) {
            @nav{ qw(level get_name_date) } = (2, 1);
        }
        elsif ( $nav{get_name_date} == 1 and $t->is_text ) {
            @data{ qw(name posted_on) } = $t->as_is
            =~ /Posted by (.+) on (.+)/;

            @nav{ qw(level get_name_date) } = (3, 0);
        }
        elsif ( $nav{start} == 1
            and $t->is_start_tag('option')
            and defined $t->get_attr('selected')
        ) {
            @nav{ qw(level get_lang) } = (4, 1);
        }
        elsif ( $nav{get_lang} == 1 and $t->is_text ) {
            @nav{ qw(level get_lang) } = (5, 0);
            $data{lang} = $t->as_is;
        }
        elsif ( $nav{start} == 1
            and $t->is_start_tag('textarea')
            and defined $t->get_attr('id')
            and $t->get_attr('id') eq 'code'
        ) {
            @nav{ qw(level get_content) } = (6, 1);
        }
        elsif ( $nav{get_content} == 1 and $t->is_text ) {
            $data{content} = $t->as_is;
            $nav{is_success} = 1;
            last;
        }
        elsif ( $nav{get_content} == 1 and $t->is_end_tag('textarea') ) {
            return $self->_set_error('This paste does not seem to exist');
        }
    }

    unless ( $nav{is_success} ) {
        return $self->_set_error (
            "Parser error (level $nav{level}).\n"
                . "Failed on content:\n$content"
        );
    }

    for ( values %data ) {
        unless ( defined and length ) {
            $_ = 'N/A';
            next;
        }
        decode_entities $_;
        s/\240/ /g;
    }

    $self->content( $data{content} );
    return \%data;
}