| Plucene documentation | Contained in the Plucene distribution. |
Plucene::Analysis::LetterTokenizer - Letter tokenizer
# isa Plucene::Analysis::CharTokenizer
This is the letter tokenizer class, which divides text at non-letters.
Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces
| Plucene documentation | Contained in the Plucene distribution. |
package Plucene::Analysis::LetterTokenizer;
use strict; use warnings; use base 'Plucene::Analysis::CharTokenizer'; sub token_re { qr/[[:alpha:]]+/ } 1;