| Search-Indexer-Incremental-MD5 documentation | Contained in the Search-Indexer-Incremental-MD5 distribution. |
Search::Indexer::Incremental::MD5::Language::Perl - defined perl specific data to use with Search::Indexer
my @perl_extra_arguments = get_perl_word_regex_and_stopwords() ;
my $searcher
= eval
{
Search::Indexer::Incremental::MD5::Searcher->new
(
...
@perl_extra_arguments,
);
} or croak "No full text index found! $@\n" ;
This module contains the regex and stopwords specific for Perl.
The word regex and stopwords available in this module are specific for the Perl language. They regex allows Search::Indexer to precisely find what Perl considers as a word while the stop words limit the number of word indexed.
creates a $word_regex and $stopwords for the perl language
Arguments - None
Returns - a list of tuples
Exceptions - None
None so far.
Nadim ibn hamouda el Khemir CPAN ID: NKH mailto: nadim@cpan.org
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
You can find documentation for this module with the perldoc command.
perldoc Search::Indexer::Incremental::MD5
You can also look for information at:
Please report any bugs or feature requests to L <bug-search-indexer-incremental-md5@rt.cpan.org>.
We will be notified, and then you'll automatically be notified of progress on your bug as we make changes.
| Search-Indexer-Incremental-MD5 documentation | Contained in the Search-Indexer-Incremental-MD5 distribution. |
package Search::Indexer::Incremental::MD5::Language::Perl ; use strict; use warnings ; use Carp qw(carp croak confess) ; BEGIN { use Sub::Exporter -setup => { exports => [ qw(get_perl_word_regex_and_stopwords) ], groups => { all => [ qw() ], } }; use vars qw ($VERSION); $VERSION = '0.03'; } #---------------------------------------------------------------------------------------------------------- use File::stat; use Time::localtime; use BerkeleyDB; use List::Util qw/sum/; use Search::Indexer::Incremental::MD5::Indexer qw() ; use Search::Indexer::Incremental::MD5::Searcher qw() ; use Digest::MD5 ; use English qw( -no_match_vars ) ; use Readonly ; Readonly my $EMPTY_STRING => q{} ; #~ my @perl_extra_arguments ; #~ @perl_extra_arguments = get_perl_word_regex_and_stopwords() if($options->{perl_mode}) ; #~ my @stopwords ; #~ @stopwords = (STOPWORDS => $options->{stopwords_file}) if($options->{stopwords_file}) ; #----------------------------------------------------------------------------------------------------------
#---------------------------------------------------------------------------------------------------------- sub get_perl_word_regex_and_stopwords {
my $id_regex = qr{ (?![0-9]) # don't start with a digit \w\w+ # start with 2 or more word chars .. (?:::\w+)* # .. and possibly ::some::more::components }smx; my $word_regex = qr{ (?: # either a Perl variable: (?:\$\#?|\@|\%) # initial sigil (?: # followed by $id_regex # an id | # or \^\w # builtin var with '^' prefix | # or (?:[\#\$](?!\w))# just '$$' or '$#' | # or [^\{\w\s\$] # builtin vars with 1 special char ) | # or $id_regex # a plain word or module name ) }smx; my @stopwords = ( 'a' .. 'z', '_', '0' .. '9', qw/ __data__ __end__ __file__ __line__ $class $indexing_operation above after all also always an and any are as at be because been before being both but by can cannot could die do done defined do does doesn each else elsif eq for from foreach ge gt has have how if in into is isn it item its keys last le lt many may me method might must my ne new next no nor not of on only or other our package pl pm push qq qr qw ref return see shift should since so some something sub such than that the their them then these they this those to tr undef unless until up us use used uses using values was we what when which while will with would you your COPYRIGHT LICENSE /, 'SEE ALSO', ); return(WORD_REGEX => $word_regex, STOPWORDS => \@stopwords,) ; } #---------------------------------------------------------------------------------------------------------- 1 ;