| Tie-Google documentation | Contained in the Tie-Google distribution. |
Tie::Google - Single-variable access to Google search results
my $KEYFILE = glob "~/.googlekey";
my ($g, @g, %g);
# Tied array interface
tie @g, "Tie::Google", $KEYFILE, "perl";
for my $r (@g) {
printf " * <a href='%s'>%s</a>\n",
$r->{'URL'}, $r->{'title'};
}
# Tied hash interface
tie %g, "Tie::Google", $KEYFILE;
for my $term (qw[ perl python ruby ]) {
my $res = $g{$term};
printf "%d results for '%s:\n", scalar @$res, $term;
for my $r (@$res) {
printf " * <a href='%s'>%s</a>\n",
$r->{'URL'}, $r->{'title'};
}
}
# Tied scalar interface: I Feel Lucky
use LWP::Simple qw(getprint);
tie $g, "Tie::Google", $KEYFILE, "perl";
getprint($g->{'URL'});
Using tied variables can make searching Google much simpler for
trivial programs. Tie::Google presents a simple interface to
Google's search API, using Net::Google as the underlying transport
mechanism. To use Tie::Google, you must already be registered with
Google and have an API key.
You can tie scalars, arrays, or hashes to the Tie::Google class;
each offers slightly different functionality, but all offer direct
access to Google search results. The basic syntax of all types is:
tie VAR, 'Tie::Google', $APIKEY, $QUERY, \%OPTIONS;
where:
VAR is the variable name, which can be a scalar, array, or hash:
tie $g, "Tie::Google", $KEY, $term; tie @g, "Tie::Google", $KEY, $term; tie %g, "Tie::Google", $KEY, $term;
APIKEY is your Google API key or a file containing the key as the only item on the first line; see http://apis.google.com for details.
QUERY is your actual search term(s), as a string. This can be arbitrarily complex, up to the limits Google allows:
tie $g, "Tie::Google", $KEY,
"site:cnn.com allintitle:priest court -judas";
Any options specified in this hashref will be passed to the
Net::Google::Search instance. Available options include starts_at,
max_results, ie, oe, and lr. See Net::Google.
Tieing an array to Tie::Google gives you an array of search
results. How many search results are returned depends on the value of
the max_results option defined when the array was tied (or
$DEFAULT_BATCH_SIZE if max_results was not set), though extending
the array of results can be done by growing the array.
$#g = 20;
Will resize the result set to 20 results. If there are more than 20, the ones on the end will be popped off; if there are less than 20, then more will be retrieved.
Tie::Google supports all non-additive array operations, including
shift, pop, and the 3 argument form of splice (not the 4
argument version). Specifically unallowed are unshift, push,
and general assigment.
See "RESULTS" for details about the individual search results.
The tied hash interface is similar to the tied array interface, except that there are a bunch of them. Asking for a key in the hash %g initiates a search to Google, with the specified key as the search term:
my $results = $g{'perl apache'};
This initiates a search with "perl apache" as the query. $results is a reference to an array of hashrefs (see "RESULTS" for details about said hashrefs).
Tied hashes support all hash functions, including each, keys,
and values. Deleting from the hash is allowed (it removes the
search results for the deleted terms), but adding to the hash is not.
There can be many sets of search results stored in a tied hash. To see what this looks like, try this:
use Data::Dumper;
my (%g, $KEY, $dummy);
tie %g, "Tie::Google", $KEY;
$dummy = $g{'perl'};
$dummy = $g{'python'};
$dummy = $g{'ruby'};
print Dumper(tied(%g));
Also, for comparison, try:
tie @g, "Tie::Google", $KEY, "perl"; print Dumper(tied(@g));
If starts_at or max_results are specified during the tie,
these options are carried over into new searches (when a new key is
requested from the hash), so plan accordingly. If the max_value is
set to 1000, for example, then every access of a new key is going to
contain 1000 elements, which will be pretty slow.
Do you feel lucky? If so, tie a scalar to Tie::Google:
tie $g, "Tie::Google", $KEY, "python";
Will give you the top result. This is conceptually similar to using the "I Feel Lucky" button on Google.com's front search interface.
All results (values returned from these tied variables) are hash
references; the contents of these hashrefs are based on the
Net::Google::Response class (see Net::Google::Response for
details). These elements currently are:
All keys are case sensitive, and return exactly what
Net::Google::Response says they do (Tie::Google does no
massaging of this data).
This module is far from complete, or even fully thought out. TODO items currently include:
$#g = 100;
Tie::Google. I
started doing it this way but it made my brain hurt. while (<$google>) {
...
Net::Google::Result instances (where the
data is currently being derived from), or new objects in their own
right (e.g., Tie::Google::Result)? I see advantages to each path:
data structures would be simpler, passing on Result objects without
modification would be faster, and using a new set of objects allows
new functionality to be added, for example useful stringification.Net::Google, DBD::google
darren chamberlain (<darren@cpan.org>), with some prompting from Richard Soderberg.
| Tie-Google documentation | Contained in the Tie-Google distribution. |
package Tie::Google; # ---------------------------------------------------------------------- # $Id: Google.pm,v 1.3 2003/04/01 14:48:34 dlc Exp $ # ---------------------------------------------------------------------- # Apparently, a few people thought this would be a neat idea. # The initial email I recieved on the topic: # # From: Richard Soderberg <blog@crystalflame.net> # Date: Thu, 13 Feb 2003 10:42:07 -0500 # To: darren@cpan.org # Subject: Tie::Google? # # #perl found DBD::google recently. I thought I'd point out that there'd # also be a great market for Tie::Google. =) # # my @results; tie @results, 'Tie::Google', $KEY, $saerch_string; # do ... for my $result { grep { $_->{url} =~ /google.com/ } @results }; # # - R. # # To which I cavalierly responded with something ridiculous, like "OK". use strict; use vars qw($VERSION $DEBUG $DEFAULT_BATCH_SIZE); use Carp qw(carp); use Symbol qw(gensym); use Net::Google; # Offsets into the array-based object sub KEY() { 0 } # The user's API key sub TYPE() { 1 } # The tie type: SCALAR, ARRAY, HASH sub QUERY() { 2 } # The query terms sub OPTIONS() { 3 } # Options tied in sub DATA() { 4 } # Search results sub GOOGLE() { 5 } # Net::Google instance # Tie types that we support and have to differentiate between sub SCALAR() { 0 } sub ARRAY() { 1 } sub HASH() { 2 } $VERSION = 0.03; $DEFAULT_BATCH_SIZE = 10 unless defined $DEFAULT_BATCH_SIZE; $DEBUG = 0 unless defined $DEBUG; # tie constructors sub TIESCALAR { return shift->new(SCALAR, @_) } sub TIEARRAY { return shift->new(ARRAY, @_) } sub TIEHASH { return shift->new(HASH, @_) } # ---------------------------------------------------------------------- # new($TYPE, $KEY, $query) # # Create a new Tie::Google instance. This method should never be # called by outside facing code, only by the TIEFOO methods. # # A Tie::Google instance maintains a few pieces of information: # # 0. The API key of the user # # 1. The type of tie (SCALAR, ARRAY, HASH) # # 2. The query # # 3. Options passed in # # 4. The results of the query. # # 5. The Net::Google instace, created with the API key (item 0) # # The interesting things happen here in new. new must be passed a # type, a key, and a query. If the user is tieing a scalar ("I feel # lucky"), then we are only interested in the first element returned # by the search; if the user is tieing an array, then we want all of # the elements; if the user is tieing a hash, then we want all of the # elements of a number of searches. My "solution" (heh heh heh) is to # store all data in a hashref, indexed into the instance as DATA; if # the user is tieing an array or scalar (Tie::Google treats them # identically), then we store the results in the key named $KEY, so # that we can treat all types of ties consistently; otherwise, search # results are keyed by query. # ---------------------------------------------------------------------- sub new { my ($class, $type, $KEY, $query, $options) = @_; $options = { } unless defined $options && ref($options) eq 'HASH'; my $self = bless [ $KEY, $type, $query, $options, { }, undef, ] => $class; # Is $KEY actually a file? # I do this in DBD::google as well; perhaps there I should submit # a patch to Aaron so that Net::Google can do this directly. if (-e $KEY) { my $fh = gensym; open $fh, $KEY or die "Can't open keyfile $KEY for reading: $!"; chomp($KEY = <$fh>); close $fh or die "Can't close keyfile $KEY: $!"; $self->[KEY] = $KEY; } # Set some reasonable defaults search boundaries, and instantiate # the Net::Google instance. $options->{'starts_at'} = 0 unless defined $options->{'starts_at'}; $options->{'max_results'} ||= $DEFAULT_BATCH_SIZE; $self->[GOOGLE] = Net::Google->new(key => $self->[KEY], debug => $options->{'debug'} || 0); # * If called from TIEHASH, then store the results keyed by # search terms, otherwise keyed by $KEY # # * If called from TIESCALAR, we only want the first result. # # $self->[OPTIONS] contains starts_at and max_results. # if ($type == HASH) { $self->do_search($query, $query, $self->[OPTIONS]->{'starts_at'}, $self->[OPTIONS]->{'max_results'}); } elsif ($type == SCALAR) { $self->do_search($KEY, $query, 0, 1); } else { $self->do_search($KEY, $query, $self->[OPTIONS]->{'starts_at'}, $self->[OPTIONS]->{'max_results'}); } return $self; } # ---------------------------------------------------------------------- # do_search($store_as, $query) # # This is where all the Net::Google magic has to happen. # # do_search will use Net::Google to search for $query, and store the # results in $self->[DATA]->{$store_as}. # ---------------------------------------------------------------------- sub do_search { my ($self, $store_as, $query, $start, $num) = @_; # Preparation for the search # # do_search can conceivably be invoked with one argument, # in which case we use it both as search term and key into the # DATA hash. # # $start and $num will be taken from OPTIONS->{'starts_at'} and # OPTIONS->{'max_results'}, respectively, if they are not # provided. return unless $store_as; $query ||= $store_as; $start = $self->[OPTIONS]->{'starts_at'} unless defined $start; $num ||= $self->[OPTIONS]->{'max_results'}; # The search my $search = $self->[GOOGLE]->search(%{ $self->[OPTIONS] }); $search->query($query); $search->starts_at($start); $search->max_results($num); $self->[DATA]->{$store_as} = [ map { +{ title => $_->title(), URL => $_->URL(), snippet => $_->snippet(), cachedSize => $_->cachedSize(), directoryTitle => $_->directoryTitle(), summary => $_->summary(), hostName => $_->hostName(), directoryCategory => $_->directoryCategory(), } } @{ $search->results } ]; } # ---------------------------------------------------------------------- # is_scalar(), is_array(), is_hash() # # Utility methods; is the object a tied scalar, tied array, or tied hash? # ---------------------------------------------------------------------- sub is_scalar { shift->[TYPE] == SCALAR } sub is_array { shift->[TYPE] == ARRAY } sub is_hash { shift->[TYPE] == HASH } # ---------------------------------------------------------------------- # FETCH() # scalar # FETCH($int) # array # FETCH($key) # hash # # In the case of a tied scalar or a tied array, the search should # already have been performed. In the case of a tied hash, that might # not necessarily be the case, so we might have to do a search. # # Needed by tied scalar, tied hash, and tied array implementations. # ---------------------------------------------------------------------- sub FETCH { my ($self, $index) = @_; $index = 0 unless defined $index; if ($self->is_hash) { $self->do_search($index, $index) unless exists $self->[DATA]->{$index}; return $self->[DATA]->{$index}; } return $self->[DATA]->{$self->[KEY]}->[$index]; } # ---------------------------------------------------------------------- # EXISTS($item) # # Returns true if this item exists in the instances search results: # # tie %g, "Tie::Google", $KEY, "perl"; # print exists $g{"perl"}; # # tie @g, "Tie::Google", $KEY, "perl"; # print exists $g[2]; # does this work? # # Needed for tied hash and tied array implementation. # ---------------------------------------------------------------------- sub EXISTS { my ($self, $index) = @_; return exists $self->[DATA]->{$index} if $self->is_hash; return exists $self->[DATA]->{$self->[KEY]}->[$index]; } # ---------------------------------------------------------------------- # CLEAR() # # Clears out the search results: # # tie @g, "Tie::Google", $KEY, "perl"; # @g = (); # # tie %g, "Tie::Google", $KEY, "perl"; # print $g{"apache"}; # print $g{"python"}; # %g = (); # # Needed by the tied hash and tied array interfaces. # ---------------------------------------------------------------------- sub CLEAR { my $self = shift; return %{$self->[DATA]} = () if $self->is_hash; return @{$self->[DATA]->{$self->[KEY]}} = (); } # ---------------------------------------------------------------------- # FIRSTKEY() # # Needed for each(%g). This implementation is taken from Tie::Hash. # # NOTE: This only iterates over keys that are _already defined_! It # _does not_ attempt to iterate over all of Google, or anything silly # like that. Although that would be great fun... # # Needed for tied hash implementation. # ---------------------------------------------------------------------- sub FIRSTKEY { my $self = shift; my $a = scalar keys %{$self->[DATA]}; each %{$self->[DATA]} } # ---------------------------------------------------------------------- # NEXTKEY() # # Needed for each(%g). This implementation is taken from Tie::Hash. # # See NOTE for FIRSTKEY. # # Needed for tied hash implementation. # ---------------------------------------------------------------------- sub NEXTKEY { my $self = shift; each %{$self->[DATA]} } # ---------------------------------------------------------------------- # DELETE($index) # # Remove an item from the search results list. # # Needed for tied hash and tied array implementation. # ---------------------------------------------------------------------- sub DELETE { my ($self, $index) = @_; return delete $self->[DATA]->{$index} if $self->is_hash; return delete $self->[DATA]->{$self->[KEY]}->{$index} if $self->is_array; } # ---------------------------------------------------------------------- # STORE($index, $value) # # Anyone calling this method either misunderstands Google or is # intentionally attempting to push the limits of this module. # # Nothing should be able to store anything here, right? # # NOTE: This means that these instances are effectively static once # they are initialized! # # Needed for tied scalar, tied array, and tied hash implementations. # ---------------------------------------------------------------------- sub STORE { carp("Misguided attempt to modify Google's database") } # ---------------------------------------------------------------------- # STORESIZE($count) # # Called when the user does: # # $#g = 100; # # If $count > current number of elements, then extend the search. # If $count < current number of elements, then drop some. # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub STORESIZE { my ($self, $count) = @_; my $arr = $self->[DATA]->{$self->[KEY]}; my $cur_total = scalar @$arr; if ($count > $cur_total) { $self->do_search($self->[KEY], $self->[QUERY], 0, $count); } elsif ($count == $cur_total) { # la la la... } else { pop @$arr while @$arr > $count; } return $self->FETCHSIZE(); } # ---------------------------------------------------------------------- # FETCHSIZE() # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub FETCHSIZE { my $self = shift; scalar @{$self->[DATA]->{$self->[KEY]}}; } # ---------------------------------------------------------------------- # EXTEND($size) # # Called when the user does: # # @g = 100; # # Needed for tied array implementation. # # XXX This implementation might not be right. # ---------------------------------------------------------------------- sub EXTEND { shift->STORESIZE(@_) } # ---------------------------------------------------------------------- # PUSH($item) # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub PUSH { carp "Can't add results to Google's database -- do it the old " . "fashioned way, please!"; } # ---------------------------------------------------------------------- # POP() # # Removes the last search result from the list of results, and # returns it. # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub POP { my $self = shift; pop @{$self->[DATA]->{$self->[KEY]}}; } # ---------------------------------------------------------------------- # SHIFT() # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub SHIFT { my $self = shift; shift @{$self->[DATA]->{$self->[KEY]}}; } # ---------------------------------------------------------------------- # UNSHIFT($item) # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub UNSHIFT { carp "Trying to stick your results into the head of Google's ". "list, eh? Shame on you!"; } # ---------------------------------------------------------------------- # SPLICE($offset, $limit, @list) # # Needed for tied array implementation. # ---------------------------------------------------------------------- sub SPLICE { my ($self, $offset, $limit) = @_; my $arr = $self->[DATA]->{$self->[KEY]}; if (@_ > 3) { carp "Can't modify search results this way. Please stuff ". "Google the old fashioned way."; return; } splice @$arr, $offset, $limit; } # ---------------------------------------------------------------------- # DESTROY() # # # Needed by the tied hash, tied array, and tied scalar interfaces. # ---------------------------------------------------------------------- sub DESTROY { } 1; __END__ XXX The TIEARRAY implementation is incomplete XXX The Net::Google integration is completely missing Tests!