| WWW-Yahoo-Groups documentation | view source | Contained in the WWW-Yahoo-Groups distribution. |
WWW::Yahoo::Groups - Automated access to Yahoo! Groups archives.
my $y = WWW::Yahoo::Groups->new();
$y->login( $user => $pass );
$y->list( 'Jade_Pagoda' );
my $email = $y->fetch_message( 2345 );
# Error catching
my $email = eval { $y->fetch_message( 93848 ) };
if ( $@ and ref $@ and $@->isa('X::WWW::Yahoo::Groups') )
{
warn "Problem: ".$@->error;
}
WWW::Yahoo::Groups retrieves messages from the archive of Yahoo
Groups. It provides a simple OO interface to logging in and retrieving
said messages which you may then do with as you will.
Try to be a well behaved bot and sleep() for a few seconds (at least)
after doing things. It's considered polite. There's an
autosleep method that should be useful for this.
Recently, this has been set to a default of 1 second. Feel free to tweak
if necessary.
If you're used to seeing munged email addresses when you view
the message archive (i.e. you're not a moderator or owner of
the group) then you'll be pleased to know that
WWW::Yahoo::Groups can demunge those email addresses.
All exceptions are subclasses of X::WWW::Yahoo::Groups, itself a
subclass of Exception::Class. See WWW::Yahoo::Groups::Errors for
details.
Spidering Hacks from O'Reilly (http://www.oreilly.com/catalog/spiderhks/) is a great book for anyone wanting to know more about screen-scraping and spidering.
There is a WWW::Yahoo::Groups based hack by Andy Lester:
and two hacks, not related to this module, by me, Iain Truskett:
Create a new WWW::Yahoo::Groups robot.
my $y = WWW::Yahoo::Groups->new();
It can take a has of named arguments. Two arguments are defined:
debug and autosleep. They correspond to the methods of the same
name.
my $y = WWW::Yahoo::Groups->new(
debug => 1,
autosleep => 4,
);
Enable/disable/read debugging mode.
$y->debug(0); # Disable
$y->debug(1); # Enable
warn "Debugging!" if $y->debug();
The debug method of the current agent object will
be invoked with the truth of the argument. This usually means
debug in WWW::Yahoo::Groups::Mechanize.
If given a parameter, it sets the numbers of seconds to sleep. Otherwise, it returns the number. Defaults to 1 second.
$y->autosleep( 5 ); # Set it to 5.
sleep ( $y->autosleep() );
May throw X::WWW::Yahoo::Groups::BadParam if given invalid parameters.
This is used by get. If autosleep is set, then get will
sleep() for the specified period after every fetch.
Implemented by the object returned by agent. By default this means autosleep in WWW::Yahoo::Groups::Mechanize.
Logs the robot into the Yahoo! Groups system.
$y->login( $user => $passwd );
May throw:
X::WWW::Yahoo::Groups::BadFetch if it cannot fetch any of the
appropriate pages. X::WWW::Yahoo::Groups::BadParam if given invalid parameters. X::WWW::Yahoo::Groups::BadLogin if unable to log in for some reason
(error will be given the text of the Yahoo error). X::WWW::Yahoo::Groups::AlreadyLoggedIn if the object is already
logged in. I intend to make this exception redundant, perhaps by
just making login a null-op is we're already logged in, or by calling
logout and then relogging in.Logs the robot out of the Yahoo! Groups system.
$y->logout();
May throw:
X::WWW::Yahoo::Groups::BadFetch if it cannot fetch any of the
appropriate pages. X::WWW::Yahoo::Groups::BadParam if given invalid parameters. X::WWW::Yahoo::Groups::NotLoggedIn if the bot is already logged out
(or never logged in).Returns 1 if you are logged in, else 0. Note that this merely tests if you've used the login method successfully, not whether the Yahoo! site has expired your session.
print "Logged in!\n" if $w->loggedin();
If given a parameter, it sets the list to use. Otherwise, it returns
the current list, or undef if no list is set.
IMPORTANT: list name must be correctly cased as per how Yahoo! Groups cases it. If not, you may experience odd behaviour.
$y->list( 'Jade_Pagoda' );
my $list = $y->list();
May throw X::WWW::Yahoo::Groups::BadParam if given invalid parameters.
See also lists for how to get a list of possible lists.
If you'd like a list of the groups to which you are subscribed, then use this method.
my @groups = $w->lists();
May throw X::WWW::Yahoo::Groups::BadParam if given invalid
parameters, or X::WWW::Yahoo::Groups::BadFetch if it cannot fetch any
of the appropriate pages from which it extracts the information.
Note that it does handle people with more than one page of groups.
Returns the lowest message number with the archive.
my $first = $w->first_msg_id();
It will throw X::WWW::Yahoo::Groups::NoListSet if no list has been
specified with lists, X::WWW::Yahoo::Groups::UnexpectedPage if
the page fetched does not contain anything we thought it would, and
X::WWW::Yahoo::Groups::BadFetch if it is unable to fetch the page it
needs.
Returns the highest message number with the archive.
my $last = $w->last_msg_id();
# Fetch last 10 messages:
for my $number ( ($last-10) .. $last )
{
push @messages, $w->fetch_message( $number );
}
It will throw X::WWW::Yahoo::Groups::NoListSet if no list has been
specified with lists, X::WWW::Yahoo::Groups::UnexpectedPage if
the page fetched does not contain anything we thought it would, and
X::WWW::Yahoo::Groups::BadFetch if it is unable to fetch the page it
needs.
Fetches a specified message from the list's archives. Returns it as a mail message (with headers) suitable for saving into a Maildir.
my $message = $y->fetch_message( 435 );
May throw any of:
X::WWW::Yahoo::Groups::BadFetch if it cannot fetch any of the
appropriate pages. X::WWW::Yahoo::Groups::BadParam if given invalid parameters. X::WWW::Yahoo::Groups::NoListSet if no list is set. X::WWW::Yahoo::Groups::UnexpectedPage if we fetched a page and it was
not what we thought it was meant to be. X::WWW::Yahoo::Groups::NotThere if the message does not exist in the
archive (any of deleted, never archived or you're beyond the range of
the group).This does some simple reformatting of headers. Yahoo!Groups seems to
manage to mangle multiline headers. This is particularly noticable with
the Received header.
The rule is that any line that starts with a series of lowercase letters or hyphens that is NOT immediately followed by a colon is regarded as being part of the previous line and is indented with a space character (as per RFC2822).
Input to this method should be a whole message. Output is that same message, with the headers repaired.
This method is called by fetch_message but this was not always the case. If you have archives that predate this implicit call, you may want to run messages through this routine.
Returns the RSS for the group's most recent messages. See XML::Filter::YahooGroups for ways to process this RSS into containing the message bodies.
my $rss = $w->fetch_rss();
If a parameter is given, it will return that many items in the RSS file. The number must be between 1 and 100 inclusive.
my $rss = $w->fetch_rss( 10 );
Returns or sets the WWW::Mechanize based agent. Not for general use.
If you must fiddle with it, your object's API must match that of
WWW::Yahoo::Groups::Mechanize and WWW::Mechanize.
Fetch a given URL. Delegated to "get" in WWW::Yahoo::Groups::Mechanize
(well, the get method of the object returned by agent).
This method does nothing as Yahoo changed their algorithm.
This checks whether a given URL is to a protected email or not. It
returns $text regardless as I do not have a decoding algorithm for
Yahoo's updated encoding scheme.
my $text = $self->_check_protected( $url, $text );
Simon Hanmer for having problems with the module, thus resulting in improved error reporting, param validation and corrected prerequisites. Since then, Simon also provided a basis for the lists and last_msg_id methods and is causing me to think harder about my exceptions.
Aaron Straup Cope (ASCOPE) for writing XML::Filter::YahooGroups which uses this module for retrieving message bodies to put into RSS.
Randal Schwartz (MERLYN) for pointing out some problems back in 1.4 and noting problems caused by the hash randomisation.
Ray Cielencki (SLINKY) for first_msg_id and "Age Restricted" notice
bypassing.
Vadim Zeitlin for yahoo2mbox from which I blatantly stole some features. (Well, I say stole but yahoo2mbox is public domain).
Andy Lester (PETDANCE) for writing about this module in Spidering Hacks.
iTerrence Brannon (TBONE) for reporting the example program and empty body bugs.
Support for this module is provided courtesy the CPAN RT system via the web or email:
http://perl.dellah.org/rt/yahoogroups
bug-www-yahoo-groups@rt.cpan.org
This makes it much easier for me to track things and thus means your problem is less likely to be neglected.
Please include the versions of WWW::Yahoo::Groups and Perl
that you are using and, if possible, the name of the group and
the number of any messages you are having trouble with.
Copyright © Iain Truskett, 2002-2003. All rights reserved.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.000 or, at your option, any later version of Perl 5 you may have available.
The full text of the licenses can be found in the Artistic and COPYING files included with this module, or in perlartistic and perlgpl as supplied with Perl 5.8.1 and later.
Iain Truskett <spoon@cpan.org>
perl, XML::Filter::YahooGroups, http://groups.yahoo.com/.
| WWW-Yahoo-Groups documentation | view source | Contained in the WWW-Yahoo-Groups distribution. |