Plucene::Analysis::LetterTokenizer - Letter tokenizer


Plucene documentation Contained in the Plucene distribution.

Index


Code Index:

NAME

Top

Plucene::Analysis::LetterTokenizer - Letter tokenizer

SYNOPSIS

Top

	# isa Plucene::Analysis::CharTokenizer

DESCRIPTION

Top

This is the letter tokenizer class, which divides text at non-letters.

Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces


Plucene documentation Contained in the Plucene distribution.

package Plucene::Analysis::LetterTokenizer;

use strict;
use warnings;

use base 'Plucene::Analysis::CharTokenizer';

sub token_re { qr/[[:alpha:]]+/ }

1;