| SWISH-Prog documentation | Contained in the SWISH-Prog distribution. |
SWISH::Prog::Aggregator::DBI - index DB records with Swish-e
use SWISH::Prog::Aggregator::DBI;
use Carp;
my $aggregator = SWISH::Prog::Aggregator::DBI->new(
db => [
"DBI:mysql:database=movies;host=localhost;port=3306",
'some_user', 'some_secret_pass',
{
RaiseError => 1,
HandleError => sub { confess(shift) },
}
],
schema => {
'moviesIlike' => {
title => {type => 'char', bias => 1},
synopsis => {type => 'char', bias => 1},
year => {type => 'int', bias => 1},
director => {type => 'char', bias => 1},
producer => {type => 'char', bias => 1},
awards => {type => 'char', bias => 1},
date => {type => 'date', bias => 1},
swishdescription => { synopsis => 1, producer => 1 },
swishtitle => 'title',
}
}
alias_columns => 1,
indexer => SWISH::Prog::Indexer::Native->new,
);
$aggregator->crawl();
SWISH::Prog::Aggregator::DBI is a SWISH::Prog::Aggregator subclass designed for providing full-text search for databases.
Since SWISH::Prog::Aggregator::DBI inherits from SWISH::Prog::Aggregator, read that documentation first. Any overridden methods are documented here.
Create new aggregator object.
The following opts are required:
connect_info is passed directly to DBI's connect() method, so see the DBI docs for syntax. If connect_info is a DBI handle object, it is accepted as is. If connect_info is an array ref, it will be dereferenced and passed to connect(). Otherwise it will be passed to connect as is.
db_schema is a hashref of table names and column descriptions. Each key should be a table name. Each value should be a hashref of column descriptions, where the key is the column name and the value is a hashref of type and bias. See the SYNOPSIS.
There are two special column names: swishtitle and swishdescription.
These are reserved for mapping real column names to Swish-e property names
for returning in search results. swishtitle should be the name of a column,
and swishdescription should be a hashref of column names to include
in the StoreDescription value.
A SWISH::Prog::Indexer-derived object.
The following opts are optional:
The alias_columns flag indicates whether all columns should be searchable
under the default MetaName of swishdefault. The default is 1 (true). This
is not the default behaviour of swish-e; this is a feature of SWISH::Prog.
NOTE: The new() method simply inherits from SWISH::Prog::Aggregator, so any params valid for that method are allowed here.
See SWISH::Prog::Class. This method does all the setup.
Create index.
Returns number of rows indexed.
Peter Karman, <perl@peknet.com>
Please report any bugs or feature requests to bug-swish-prog at rt.cpan.org, or through
the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=SWISH-Prog.
I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc SWISH::Prog
You can also look for information at:
Copyright 2008-2009 by Peter Karman
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| SWISH-Prog documentation | Contained in the SWISH-Prog distribution. |
package SWISH::Prog::Aggregator::DBI; use strict; use warnings; use base qw( SWISH::Prog::Aggregator ); use Carp; use Data::Dump qw( dump ); use DBI; use SWISH::Prog::Utils; __PACKAGE__->mk_accessors(qw( db alias_columns schema )); our $VERSION = '0.51'; my $XMLer = Search::Tools::XML->new(); # included in Utils
sub init { my $self = shift; $self->SUPER::init(@_); # verify DBI connection if ( defined( $self->db ) ) { if ( ref( $self->db ) eq 'ARRAY' ) { $self->db( DBI->connect( @{ $self->{db} } ) ); } elsif ( ref( $self->db ) && $self->db->isa('DBI::db') ) { # do nothing } else { $self->db( DBI->connect( $self->db ) ); } } else { croak "db required"; } # verify schema if ( defined $self->schema ) { my $schema = $self->schema; unless ( ref($schema) eq 'HASH' ) { croak "schema must be a hashref"; } for my $table ( keys %$schema ) { my $cols = $schema->{$table}; unless ( ref($cols) eq 'HASH' ) { croak "column descriptions must be a hashref"; } for my $colname ( keys %$cols ) { my $desc = $cols->{$colname}; unless ( ref($desc) eq 'HASH' ) { croak "$colname description must be a hashref"; } $desc->{type} ||= 'char'; # TODO auto-make property types based on this. $desc->{bias} ||= 1; } } } else { croak "schema required"; } $self->{alias_columns} = 1 unless exists $self->{alias_columns}; # unless metanames are defined, use all the column names from schema my $m = $self->config->MetaNames; unless (@$m) { for my $table ( keys %{ $self->{schema} } ) { my $columns = $self->{schema}->{$table}; my %ranks; push( @{ $ranks{ $columns->{$_}->{bias} } }, $_ ) for sort keys %$columns; for my $rank ( keys %ranks ) { $self->config->MetaNamesRank( "$rank " . join( ' ', @{ $ranks{$rank} } ), 1 ); } } } # alias the top level tags to that default search # will match any metaname in any table if ( $self->alias_columns ) { $self->config->MetaNameAlias( 'swishdefault ' . join( ' ', map { '_' . $_ . '_row' } sort keys %{ $self->{schema} } ), 1 # always append ); } # add 'table' metaname $self->config->MetaNames('table'); # save all row text in the swishdescription property for excerpts $self->config->StoreDescription('XML* <_desc>'); }
sub crawl { my $self = shift; my @tables = sort keys %{ $self->{schema} }; T: for my $table (@tables) { my $table_info = $self->{schema}->{$table}; # which columns to index my @cols = sort keys %$table_info; # special col names my $desc = delete( $table_info->{swishdescription} ) || {}; my $title = delete( $table_info->{swishtitle} ) || ''; # TODO test other dbs besides mysql for quoting etc. my $c = $self->_do_table( name => $table . ".index", sql => "SELECT `" . join( '`,`', @cols ) . "` FROM $table", table => $table, desc => $desc, title => $title, ); $self->_increment_count($c); } return $self->{count}; } sub _do_table { my $self = shift; my %opts = @_; if ( !$opts{sql} ) { croak "need SQL statement to index with"; } $opts{table} ||= ''; $opts{title} ||= ''; my $counter = 0; my $indexer = $self->indexer; my $sth = $self->db->prepare( $opts{sql} ) or croak "DBI prepare() failed: " . $self->db->errstr; $sth->execute or croak "SELECT failed " . $sth->errstr; while ( my $row = $sth->fetchrow_hashref ) { my $title = $row->{ $opts{title} } || '[ no title ]'; my $xml = $self->_row2xml( $XMLer->tag_safe( $opts{table} ), $row, $title, \%opts ); my $doc = $self->doc_class->new( content => $xml, url => ++$counter, modtime => time(), parser => 'XML*', type => 'application/x-swish-dbi', # TODO ?? data => $row ); $indexer->process($doc); } $sth->finish; return $counter; } sub _row2xml { my ( $self, $table, $row, $title, $opts ) = @_; my $xml = "<_${table}_row>" . "<table>" . $table . "</table>" . "<swishtitle>" . $XMLer->utf8_safe($title) . "</swishtitle>" . "<_body>"; for my $col ( sort keys %$row ) { my @x = ( $XMLer->start_tag($col), $XMLer->utf8_safe( $row->{$col} ), $XMLer->end_tag($col) ); if ( $opts->{desc}->{$col} ) { unshift( @x, '<_desc>' ); push( @x, '</_desc>' ); } $xml .= join( '', @x ); } $xml .= "</_body></_${table}_row>"; #$self->debug and warn $xml . "\n"; return $xml; } 1; __END__