WWW::Baidu - Perl interface for the www.baidu.com search engine


WWW-Baidu documentation  | view source Contained in the WWW-Baidu distribution.

Index


NAME

Top

WWW::Baidu - Perl interface for the www.baidu.com search engine

VERSION

Top

This document describes version 0.06 of WWW::Baidu, released Jan 21, 2007.

SYNOPSIS

Top

    use WWW::Baidu;
    my $baidu = WWW::Baidu->new;
    # ensure the keys are in the GBK/GB2312 encoding if they're Chinese
    my $count = $baidu->search('Perl "Larry Wall"', 'Audrey');
    $baidu->limit(200);
    while (my $record = $baidu->next) {
        # the results are in GBK/GB2312 encoding
        print $record->title, "\n",
              $record->url, "\n",
              $record->summary, "\n",
              $record->date, "\n",
              $record->size, "\n",
              $record->cached_url, "\n\n\n";
    }

DESCRIPTION

Top

Baidu.com is a very popular Chinese search engine which does similar things as Google. This module provides you with a Perl interface to that site.

METHODS

Top

$obj = WWW::Baidu->new()
$obj = WWW::Baidu->new( cache => $cache )

This is the constructor for WWW::Baidu. It accepts an optional argument which will must be a Cache::Cache-compatible object. WWW::Baidu will use this cache instead of a default Cache::FileCache instance.

$count = $obj->search($key1, $key2, ...)

Searches Baidu by the given keys and returns the total records reported by Baidu. Note that the return value $count is only an estimation by Baidu. Usually it's not equal to the total number of records that you can fetch by the next method.

It's highly recommended to pass only string keys in the GBK or GB2312 encoding.

A call of this method will clear the internal search results' buffer and the iterator counter, but the limit setting is left intact.

$obj->limit($count)

Limits the total number of records WWW::Baidu will try to offer. This method will affect the next() method. And the internal counter will also get cleared if the search method is called again.

$record = $obj->next()

Returns the next search result which is a WWW::Baidu::Record object. WWW::Baidu accesses the baidu.com site rather lazily. That is, it only "clicks" the "Next page" link in case that the user has fetched all the records in the internal buffer.

When there's no more records (due to the capability of Baidu itself or the upper-limit set via the limit method), this method will return undef.

CACHING

Top

WWW::Baidu uses WWW::Mechanize::Cached internally so that your program will run much faster during debugging and will also behave more politely to the Baidu.com site.

CAVEAT

Top

CODE COVERAGE

Top

I use Devel::Cover to test the code coverage of my tests, below is the Devel::Cover report on this module test suite.

    ---------------------------- ------ ------ ------ ------ ------ ------ ------
    File                           stmt   bran   cond    sub    pod   time  total
    ---------------------------- ------ ------ ------ ------ ------ ------ ------
    blib/lib/WWW/Baidu.pm          98.1   84.6   66.7  100.0  100.0  100.0   93.9
    blib/lib/WWW/Baidu/Record.pm  100.0    n/a    n/a  100.0    n/a    0.0  100.0
    Total                          98.2   84.6   66.7  100.0  100.0  100.0   94.3
    ---------------------------- ------ ------ ------ ------ ------ ------ ------

SOURCE CONTROL

Top

You can always get the latest source code from the following Subversion repos:

https://svn.openfoundry.org/wwwbaidu

It has anonymous access to all.

If you like to get a commit bit, please let me know. I've been trying to follow Audrey's best practices. ;)

AUTHOR

Top

Agent Zhang <agentzh@gmail.com>

COPYRIGHT

Top

SEE ALSO

Top

WWW::Baidu::Record, http://www.baidu.com, WWW::Mechanize::Cached.


WWW-Baidu documentation  | view source Contained in the WWW-Baidu distribution.