| Hailo documentation | Contained in the Hailo distribution. |
Hailo::Role::Tokenizer - A role representing a Hailo tokenizer
newThis is the constructor. It takes no arguments.
make_tokensTakes a line of input and returns an array reference of tokens. A token is an array reference containing two elements: a spacing attribute and the token text. The spacing attribute is an integer which will be stored along with the token text in the database. The following values are currently being used:
0 - normal token1 - prefix token (no whitespace follows it)2 - postfix token (no whitespace precedes it)3 - infix token (no whitespace follows or precedes it)make_outputTakes an array reference of tokens and returns a line of output. A token is
an array reference as described in make_tokens|/make_tokens. The tokens
will be joined together into a sentence according to the whitespace
attributes associated with the tokens, as well as any formatting provided by
the tokenizer implementation.
Hinrik Örn Sigurðsson, hinrik.sig@gmail.com
Ævar Arnfjörð Bjarmason <avar@cpan.org>
Copyright 2010 Hinrik Örn Sigurðsson and Ævar Arnfjörð Bjarmason <avar@cpan.org>
This program is free software, you can redistribute it and/or modify it under the same terms as Perl itself.
| Hailo documentation | Contained in the Hailo distribution. |
package Hailo::Role::Tokenizer; BEGIN { $Hailo::Role::Tokenizer::AUTHORITY = 'cpan:AVAR'; } BEGIN { $Hailo::Role::Tokenizer::VERSION = '0.69'; } use 5.010; use Any::Moose '::Role'; use namespace::clean -except => 'meta'; has spacing => ( isa => 'HashRef[Int]', is => 'rw', default => sub { { normal => 0, prefix => 1, postfix => 2, infix => 3, } }, ); sub BUILD { my ($self) = @_; # This performance hack is here because calling # $self->spacing->{...} was significant part Tokenizer execution # time (~20s / ~1200s) since we're doing one method call and a # hash dereference my $spacing = $self->spacing; while (my ($k, $v) = each %$spacing) { $self->{"_spacing_$k"} = $v; } return; } requires 'make_tokens'; requires 'make_output'; 1;