| Lingua-JA-Kana documentation | Contained in the Lingua-JA-Kana distribution. |
Lingua::JA::Kana - Kata-Romaji related utilities
$Id: Kana.pm,v 0.5 2011/06/10 10:23:43 dankogai Exp dankogai $
use Lingua::JA::Kana;
my $hiragana = romaji2hiragana("ohayou");
my $katakana = romaji2katakana("ohasumi");
my $romaji = kana2romaji($str);
This module is a simple utility to convert katakana, hiragana, and romaji at ease. This module makes use of utf8 semantics which is introduced in Perl 5.8.0 and became stable enough in Perl 5.8.1 so you need Perl 5.8.1 or better.
Also note that strings in this module must be utf8-flagged. If they are not, you can use Encode to do so.
use Encode; use Lingua::JA::Kana my $romaji = kana2romaji(decode_utf8 $octet);
See Encode, perluniintro, and perlunicode for details.
This module exports functions below:
Converts all occurance of hiragana to katakana.
my $hiragana = hiragana2katakana($str);
its alias.
Converts all occurance of katakana to hiragana. kata2hira is an alias thereof.
my $katakana = katakana2hiragana($str);
its alias.
Converts all occurance of romaji to katakana.
my $romaji = romaji2hiragana($str);
Converts all occurance of romaji to hiragana.
my $katakana = romaji2hiragana($str);
Converts all occurance of kana (both katakana and hiragana) to romaji.
my $romaji = kana2romaji($str);
Converts all occurance of hankaku to zenkaku.
my $romaji = hankaku2zenkaku($str);
Converts all occurance of zenkaku to hankaku.
my $romaji = zenkaku2hankaku($str);
To install this module, run the following commands:
perl Makefile.PL
make
make test
make install
Dan Kogai, <dankogai at dan.co.jp>
Please report any bugs or feature requests to bug-lingua-ja-kana at rt.cpan.org, or through
the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-JA-Kana. I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
You can find documentation for this module with the perldoc command.
perldoc Lingua::JA::Kana
You can also look for information at:
Copyright 2007 Dan Kogai, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| Lingua-JA-Kana documentation | Contained in the Lingua-JA-Kana distribution. |
package Lingua::JA::Kana; use warnings; use strict; use utf8; our $VERSION = sprintf "%d.%02d", q$Revision: 0.5 $ =~ /(\d+)/g; require Exporter; use base qw/Exporter/; our @EXPORT = qw( hira2kata hiragana2katakana kata2hira katakana2hiragana romaji2hiragana romaji2katakana kana2romaji hankaku2zenkaku zenkaku2hankaku ); our $USE_REGEXP_ASSEMBLE = do { eval 'require Regexp::Assemble'; $@ ? 0 : 1; }; our $Re_Vowels = qr/[aeiou]/i; our $Re_Consonants = qr/[bcdfghjklpqrstvwxyz]/i; # note the absense of n and m our %Kata2Hepburn = qw( 㢠a 㤠i 㦠u 㨠e 㪠o ã¡ xa 㣠xi 㥠xu ã§ xe ã© xo ã« ka ã ki 㯠ku ã± ke ã³ ko 㬠ga ã® gi ã° gu ã² ge ã´ go ã㣠kya ã㥠kyu ãã§ kyo ãµ sa ã· shi ã¹ su ã» se ã½ so ã¶ za 㸠ji 㺠zu ã¼ ze ã¾ zo ã·ã£ sha ã·ã¥ shu ã·ã§ sho ã¸ã£ ja ã¸ã¥ ju ã¸ã§ jo ã¿ ta ã chi ã tsu ã te ã to ã㣠ti ã㥠tu ã da ã㣠di ã㥠du ã de ã do ã dhi ã dhu ã㣠cha ã㥠chu ãã§ che ãã§ cho ã㣠dha ã㥠dhu ãã§ dhe ãã§ dho ã na ã ni ã nu ã ne ã no ã ha ã hi ã fu ã he ã ho ã㣠hya ã㥠hyu ãã§ hyo ã ba ã bi ã bu ã be ã bo ã㣠bya ã㥠byu ãã§ byo ã pa ã pi ã pu ã pe ã po ã㣠pya ã㥠pyu ãã§ pyo ãã¡ fa ã㣠fi ãã§ fe ãã© fo ã ma ã mi ã mu ã¡ me 㢠mo 㤠ya 㦠yu ã¤ã§ ye 㨠yo 㣠xya 㥠xyu ã§ xyo ã© ra 㪠ri ã« ru 㬠re ã ro ãªã£ rya ãªã¥ ryu ãªã§ ryo 㯠wa ã° wi ã± we ã² wo ã¦ã¡ wa ã¦ã£ wi ã¦ã§ we ã¦ã© wo ã´ã¡ va ã´ã£ vi ã´ vu ã´ã§ ve ã´ã© vo ã³ n ); our %Kana2Hepburn = ( %Kata2Hepburn, map { katakana2hiragana($_) } %Kata2Hepburn ); our $Re_Kana2Hepburn = do { if ($USE_REGEXP_ASSEMBLE) { my $ra = Regexp::Assemble->new(); $ra->add($_) for keys %Kana2Hepburn; $ra->re; } else { my $str = join '|', keys %Kana2Hepburn; qr/(?:$str)/; } }; our %Romaji2Kata = qw( a 㢠i 㤠u 㦠e 㨠o 㪠xa ã¡ xi 㣠xu 㥠xe ã§ xo ã© ka ã« ki ã ku 㯠ke ã± ko ã³ ga 㬠gi ã® gu ã° ge ã² go ã´ kya ã㣠kyu ã㥠kyo ãã§ sa ãµ shi ã· su ã¹ se ã» so ã½ si ã· za ã¶ ji 㸠zu 㺠ze ã¼ zo ã¾ zi 㸠sha ã·ã£ shu ã·ã¥ sho ã·ã§ ja ã¸ã£ ju ã¸ã¥ jo ã¸ã§ sya ã·ã£ syu ã·ã¥ syo ã·ã§ ta ã¿ chi ã tsu ã te ã to ã xtu ã ti ã㣠tu ã㥠da ã di ã㣠du ã㥠de ã do ã dhi ã dhu ã cha ã㣠chu ã㥠che ãã§ cho ãã§ tya ã㣠tyu ã㥠tye ãã§ tyo ãã§ dha ã㣠dhu ã㥠dhe ãã§ dho ãã§ dya ã㣠tyu ã㥠tye ãã§ tyo ãã§ na ã ni ã nu ã ne ã no ã ha ã hi ã fu ã he ã ho ã hu ã hya ã㣠hyu ã㥠hyo ãã§ ba ã bi ã bu ã be ã bo ã bya ã㣠byu ã㥠byo ãã§ pa ã pi ã pu ã pe ã po ã pya ã㣠pyu ã㥠pyo ãã§ fa ãã¡ fi ã㣠fe ãã§ fo ãã© ma ã mi ã mu ã me ã¡ mo 㢠ya 㤠yu 㦠ye ã¤ã§ yo 㨠xya 㣠xyu 㥠xyo ã§ ra ã© ri 㪠ru ã« re 㬠ro ã rya ãªã£ ryu ãªã¥ ryo ãªã§ la ã© li 㪠lu ã« le 㬠lo ã wa 㯠wo ã² wi ã¦ã£ we ã¦ã§ va ã´ã¡ vi ã´ã£ vu ã´ ve ã´ã§ vo ã´ã© ); our $Re_Romaji2Kata = do { if ($USE_REGEXP_ASSEMBLE) { my $ra = Regexp::Assemble->new(); $ra->add($_) for keys %Romaji2Kata; my $str = $ra->re; substr( $str, 0, 8, '' ); # remove '(?-xism:' substr( $str, -1, 1, '' ); # and ')'; qr/$str/i; # and recompile with i } else { my $str = join '|', sort {length($b) <=> length($a)} keys %Romaji2Kata; qr/(?:$str)/i; } }; our %Kana2Romaji = %Kana2Hepburn; our $Re_Kana2Romaji = $Re_Kana2Hepburn; sub katakana2hiragana{ my $str = shift; $str =~ tr/ã¡-ã³ã´/ã-ãã/; $str; } sub hiragana2katakana{ my $str = shift; $str =~ tr/ã-ãã/ã¡-ã³ã´/; $str; } { no warnings 'once'; *kata2hira = \&katakana2hiragana; *hira2kata = \&hiragana2katakana; } sub romaji2katakana{ my $str = shift; # step 1; tta -> ãta $str =~ s{ ($Re_Consonants) \1 }{ "ã$1" }msxgei; # step 2; $str =~ s{ ($Re_Romaji2Kata) }{ $Romaji2Kata{lc $1} || $1 }msxgei; # step 3; $str =~ s{ ([ã¡-ã³])[mn] }{ "$1ã³" }msxgei; $str; } sub romaji2hiragana{ katakana2hiragana(romaji2katakana(shift)) }; sub kana2romaji{ my $str = shift; # step 1; $str =~ s{ ($Re_Kana2Romaji) }{ $Kana2Romaji{$1} || $1 }msxge; # step 2; ãta -> tta $str =~ s{ [ã£ã]($Re_Consonants) }{ "$1$1" }msxge; # step 3; oã¼ -> oo $str =~ s{ ($Re_Vowels)ã¼ }{ "$1$1" }msxge; $str; } if ($0 eq __FILE__){ warn $USE_REGEXP_ASSEMBLE; binmode STDOUT, ':utf8'; local $\ = "\n"; warn $Re_Romaji2Kata; print romaji2katakana("Dan Kogai"); print romaji2katakana("shimbashi"); print romaji2katakana("konnichiwa"); print romaji2hiragana("Dan Kogai"); print romaji2hiragana("shimbashi"); warn $Re_Kana2Romaji; print kana2romaji("ãã³ã³ã¬ã¤"); print kana2romaji("ãã¤ãã¿"); print kana2romaji("ã·ã³ãã·"); print romaji2hiragana("ryoukai"); # RT#39590 print romaji2hiragana("virama"); # RT#45402 } use Encode; use Encode::JP::H2Z; my $eucjp = Encode::find_encoding('eucjp'); sub hankaku2zenkaku { my $str = $eucjp->encode(shift); Encode::JP::H2Z::h2z(\$str); $eucjp->decode($str); } sub zenkaku2hankaku { my $str = $eucjp->encode(shift); Encode::JP::H2Z::z2h(\$str); $eucjp->decode($str); } 1; # End of Lingua::JA::Kana __END__