| LWP-UserAgent-ProxyHopper documentation | view source | Contained in the LWP-UserAgent-ProxyHopper distribution. |
LWP::UserAgent::ProxyHopper - LWP::UserAgent with proxi-hopping
use strict;
use warnings;
use LWP::UserAgent::ProxyHopper;
my $ua = LWP::UserAgent::ProxyHopper->new( agent => 'fox', timeout => 10 );
$ua->proxify_load;
for ( 1..5 ) {
my $response = $ua->proxify_get('http://www.privax.us/ip-test/');
if ( $response->is_success ) {
my $content = $response->content;
if ( my ( $ip ) = $content
=~ m|<p>.+?IP Address:\s*</strong>\s*(.+?)\s+|s
) {
printf "\n\nSucces!!! \n%s\n", $ip;
}
else {
printf "Response is successfull but seems like we got a wrong "
. " page... here is what we got:\n%s\n", $content;
}
}
else {
print '[script] Network error: ' . $response->status_line;
}
}
The module is a subclass of LWP::UserAgent with adds extra functionality to make proxy-hopping requests. In other words each request can be sent out from different proxy servers.
Don't get your hopes up too high... unless you can feed the module 100% working and fast proxies. Even though the module does some basic checks on whether the request succeeded and blacklists proxies that appear to be real bad there is still quite a good chance that either (a) your request will timeout after several tries or worse: (b) your request will succeed but will return not what you would expect it to as some proxies tend to drop garbage on you. Depending on settings your mileage will vary, it's speed for quality trade off.
The module fetches a list of proxy servers (see proxify_load() method)
when one of proxify_*() request methods is called it will get a proxy
from the list and try to make your request with the proxy in use. If
request succeeds it will check for a couple of "this is not what you wanted"
proxies and retry the request with a different proxy if that the case. If
this check did not raise any suspicion the result (HTTP::Response object)
will be returned back to you and proxy which was used will be put into a
"working" list. If the request failed the module will do
a basic check on the return status code and decide whether to blacklist
proxy into a "bad" list or "real_bad" list after which it will retry.
The number of times it will retry depends on retry setting to
proxify_load() method.
When the original proxy list is exhausted the module will make a new list
out of proxies which it previously listed as "working", if that fails the
"bad" list which might have working proxies. The "real_bad" list will never
be used. If both "working" and "bad" lists do not have any proxies left
the module will call proxify_load() automatically with the same
arguments you used it with the last time, therefore your program can live
long with just one call to proxify_load() during startup.
The module is a subclass of LWP::UserAgent thus you can use any
LWP::UserAgent's methods as you would before.
All the methods are prefixed with proxify_.
proxify_load $your_ua->proxify_load; # plain defaults
$your_ua->proxify_load( # juicy override
freeproxylists => 1,
plan_b => 1,
proxy4free => 0,
timeout => 20,
debug => 0,
retries => 5,
extra_proxies => [],
schemes => [ 'http', 'ftp' ],
get_list_args => {
freeproxylists => [ type => 'anonymous' ],
proxy4free => [ [2,3] ],
},
);
Instructs the object to load up a list of proxies. You must call this
method at least once before calling any other proxify_* request methods.
The return value is an arrayref of proxy addresses in a form
"http://122.122.122.122:8080/". Will croak() if after trying to fetch
proxy lists and after adding extra_proxies (see below) the proxy list
is still empty. The method takes quite a bit of arguments, all of which
are given in a key/value fashion. All of them are optional. Possible
argumens are as follows:
freeproxylists$your_ua->proxify_load( freeproxylists => 1 );
Optional. The module uses WWW::FreeProxyLists::Com and
WWW::Proxy4FreeCom modules to get the proxy list. If you set
freeproxylists argument to a false value the module will not attempt
to load any proxies from http://freeproxylists.com/ website.
Defaults to: 1
proxy4free$your_ua->proxify_load( proxy4free => 0 );
Optional. The module uses WWW::FreeProxyLists::Com and
WWW::Proxy4FreeCom modules to get the proxy list. If you set
proxy4free argument to a false value (which is the default)
the module will not attempt to load any proxies from
http://www.proxy4free.com/ website. Defaults to: 0
plan_b$your_ua->proxify_load( plan_b => 1 );
Optional. When set to a true value will enable a "Plan B" mechanism.
In other words, when plan_b and freeproxylists both set to true values
and the fetch from http://freeproxylists.com/ did not give us any proxies
the module will fetch a list from http://www.proxy4free.com/ website
irrelevant of whether or not proxy4free is set to a true value. In
other words, this is sort of a fallback thing in case
http://freeproxylists.com is down when proxy4free is set to a false
value to speedup proxy list loading process. Defaults to: 1 (enabled)
timeout$your_ua->proxify_load( timeout => 20 );
Optional. Takes a positive integer value which will be passed to
WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom constructors as
a timeout argument. In other words, this specifies the timeout for
proxy list fetching. Defaults to: 20
retries$your_ua->proxify_load( retries => 5 );
Optional. This argument specifies how many times the module
should retry the proxy_* requests if they doesn't look as successfull
ones. Generally, setting the retries argument to a higher value will
yield to more reliable requests but will also slow down the request process.
See HOW IT WORKS section about to get the idea when the module
will retry the request. Defaults to: 5.
extra_proxies$your_ua->proxify_load( extra_proxies => [] );
Optional. Takes an arrayref of proxy addresses in a format acceptable
to LWP::UserAgent's proxy() method. These will be the extra proxies
to use which you can provide. Basically you can set freeproxylists
and plan_b arguments to false values and stuff your own proxies
into extra_proxies arrayref in which case the module will not even
attempt to fetch any lists from proxy list sites (i.e. the loading will
be way faster). Defaults to: [] (no extra proxies)
schemes $your_ua->proxify_load( schemes => [ 'http', 'ftp' ] );
$your_ua->proxify_load( schemes => 'ftp' );
Optional. Specifies the first argument to pass to LWP::UserAgent's
proxy() method (i.e. the schemes to proxy for). Note: any other
schemes besides 'http' were not tested and might not even work with
the proxy lists the module fetches by default. Defaults to: http
get_list_args $your_ua->proxify_load(
get_list_args => {
freeproxylists => [ type => 'anonymous' ],
proxy4free => [ [1,2] ],
},
);
Optional. Here you have a chance to specify specific arguments to
get_list() methods of WWW::FreeProxyLists::Com and
WWW::Proxy4FreeCom modules used under the hood. The get_list_args
takes a hashref with two keys as a value. The keys must be
freeproxylists and proxy4free values of which must be arrayrefs with
arguments to give to get_list() methods of respecive modules.
debug$your_ua->proxify_load( debug => 0 );
Optional. When set to a true value will make the module carp() out
some debugging info (including the time when proccessing of any proxify_*
request methods). Defaults to: 0
proxify_get my $response = $your_ua->proxify_get('http://something.com/');
Must be called after a successfull call to proxify_load() method.
The method is the same as LWP::UserAgent's get() method except
proxify_get() will switch proxies before attempting the request.
proxify_post my $response = $your_ua->proxify_post('http://something.com/');
Must be called after a successfull call to proxify_load() method.
The method is the same as LWP::UserAgent's post() method except
proxify_post() will switch proxies before attempting the request.
Note: during my tests a lot (almost all) proxies from
http://www.freeproxylist.com/ did not permit POST requests. You might
have better luck with setting proxy4free to a true value disabling
freeproxylists argument and setting higher retries argumnet (see
proxify_load() method above),
proxify_requestmy $response = $your_ua->proxify_request( $req_obj );
Must be called after a successfull call to proxify_load() method.
The method is the same as LWP::UserAgent's request() method except
proxify_request() will switch proxies before attempting the request.
proxify_head my $response = $your_ua->proxify_head('http://something.com/');
Must be called after a successfull call to proxify_load() method.
The method is the same as LWP::UserAgent's head() method except
proxify_head() will switch proxies before attempting the request.
proxify_mirror my $response = $your_ua->proxify_mirror(
'http://something.com/file.tar.gz',
'here.tar.gz',
);
Must be called after a successfull call to proxify_load() method.
The method is the same as LWP::UserAgent's mirror() method except
proxify_mirror() will switch proxies before attempting the request.
Note: use this method with caution as some proxies return an HTML document
insted of actual content you requested.
proxify_simple_request my $response = $your_ua->proxify_simple_request('http://something.com/');
Must be called after a successfull call to proxify_load() method.
The method is the same as LWP::UserAgent's simple_request() method
except proxify_simple_request() will switch proxies before attempting
the request.
proxify_listmy $proxies_list_ref = $your_ua->proxify_list;
Must be called after a successfull call to proxify_load() method.
Takes no arguments, returns an arrayref of proxies used internally for
requests. This list will shrink as more requests are made (until it's
depleted and reloaded see HOW IT WORKS section). Note: you can
shift, push, etc. on this arrayref to dinamically set what
proxies will be used. The proxy to be used on the next proxify_* request
is the first element of this arrayref.
proxify_working_listmy $proxies_working_list_ref = $your_ua->proxify_working_list;
Must be called after a successfull call to proxify_load() method.
Takes no arguments, returns an arrayref of proxies listed as "working". See
HOW IT WORKS section above for details. Note: you can
shift, push, etc. on this arrayref to dinamically change it.
proxify_bad_listmy $proxies_bad_list_ref = $your_ua->proxify_bad_list;
Must be called after a successfull call to proxify_load() method.
Takes no arguments, returns an arrayref of proxies listed as "bad". See
HOW IT WORKS section above for details. Note: you can
shift, push, etc. on this arrayref to dinamically change it.
proxify_real_bad_listmy $proxies_real_bad_list_ref = $your_ua->proxify_real_bad_list;
Must be called after a successfull call to proxify_load() method.
Takes no arguments, returns an arrayref of proxies listed as "real bad". See
HOW IT WORKS section above for details.
proxify_schemes my $used_schemes = $your_ua->proxify_schemes;
$your_ua->proxify_schemes( [ 'http', 'ftp' ] );
Returns a currently used value for the proxify_load() method's
schemes argument. If called with an optional argument will use it as a
new value. See proxify_load() method above for details.
Note: the value will be reset on the next proxify_load() call, which
can happen automatically if proxy lists are exhausted. See HOW IT WORKS
section for details.
proxify_retries my $used_retries = $your_ua->proxify_retries;
$your_ua->proxify_retries( 10 );
Returns a currently used value for the proxify_load() method's
retries argument. If called with an optional argument will use it as a
new value.
See proxify_load() method above for details.
Note: the value will be reset on the next proxify_load() call, which
can happen automatically if proxy lists are exhausted. See HOW IT WORKS
section for details.
proxify_debug my $used_debug = $your_ua->proxify_debug;
$your_ua->proxify_debug( 1 );
Returns a currently used value for the proxify_load() method's
debug argument. If called with an optional argument will use it as a
new value. See proxify_load() method above for details.
Note: the value will be reset on the next proxify_load() call, which
can happen automatically if proxy lists are exhausted. See HOW IT WORKS
section for details.
proxify_currentmy $current_proxy = $your_ua->proxify_current;
Takes no arguments, returns a last proxy used in proxify_* request
methods. Why is is called "current"? Because it changes several times during
the calls to proxify_* request methods depending on the retries
argument's setting ( in the proxify_load() method ).
Zoffix Znet, <zoffix at cpan.org>
(http://zoffix.com, http://haslayout.net)
Please report any bugs or feature requests to bug-lwp-useragent-proxyhopper at rt.cpan.org, or through
the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=LWP-UserAgent-ProxyHopper. I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc LWP::UserAgent::ProxyHopper
You can also look for information at:
http://rt.cpan.org/NoAuth/Bugs.html?Dist=LWP-UserAgent-ProxyHopper
Copyright 2008 Zoffix Znet, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| LWP-UserAgent-ProxyHopper documentation | view source | Contained in the LWP-UserAgent-ProxyHopper distribution. |