| Lingua-FI-Hyphenate documentation | Contained in the Lingua-FI-Hyphenate distribution. |
Lingua::FI::Hyphenate - Finnish hyphenation (suomen tavutus)
Lingua::FI::Hyphenate - suomen tavutus
use Lingua::FI::Hyphenate qw(tavuta);
my @tavut = tavuta("kodeissansakaan");
print "@tavut\n"; # will print "ko deis san sa kaan\n";
use Lingua::FI::Hyphenate qw(tavuta);
my @tavut = tavuta("kodeissansakaan");
print "@tavut\n"; # tulostaa "ko deis san sa kaan\n";
tavuta() returns as a list the syllables of its Finnish input list.
The used character set is ISO 8859-1, of which the Finnish word characters the vowels are
aeiouyäåö AEIOUYÅÄÖ
and the consonants are
bcdfghjklmnpqrstvwxz BCDFGHJKLMNPQRSTVWXZ
The rules for syllable divisions are:
tavuta() palauttaa listana suomenkielisen syötelistansa tavut.
Käytetty merkistö on ISO 8859-1, suomenkieliset vokaalit ovat
aeiouyäåö AEIOUYÅÄÖ
ja konsonantit ovat
bcdfghjklmnpqrstvwxz BCDFGHJKLMNPQRSTVWXZ
Tavujakosäännöt ovat:
| Lingua-FI-Hyphenate documentation | Contained in the Lingua-FI-Hyphenate distribution. |
package Lingua::FI::Hyphenate;
use strict; use vars qw($VERSION @ISA @EXPORT_OK); $VERSION = '0.04'; require Exporter; @ISA = qw(Exporter); @EXPORT_OK = qw(tavuta); # Hardcode the character classes instead of depending on locales. my $v = "aeiouyäåöAEIOUYÅÄÖ"; my $k = "bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ"; my $V = "[$v]"; my $K = "[$k]"; my $VU = 0; sub tavuta { my (@sanat) = @_; my @tavut = @sanat; # Anything not a letter is a syllable division. @tavut = map { split /[^$v$k]+/ } @tavut; # Syllable division before any KV. # Exception: the rare loanword-based ^KK syllables. @tavut = map { split /(?=(?<!^$K)$K$V)/ } @tavut; # Syllable division between any VV pair # that is not a Finnish diphtong. @tavut = map { split /(.*?[aA])(?=[eoEO])/ } @tavut; @tavut = map { split /(.*?[eiEI])(?=[aoäöAOÄÖ])/ } @tavut; @tavut = map { split /(.*?[ouOU])(?=[aeAE])/ } @tavut; @tavut = map { split /(.*?[yäYÄ])(?=[eäEÄ])/ } @tavut; @tavut = map { split /(.*?[öÖ])(?=[eE])/ } @tavut; if ($VU) { # TO DO - TEKEMÄTTÄ. } @tavut; } 1;