Text::Phonetic::Phonem - Phonem algorithm


Text-Phonetic documentation Contained in the Text-Phonetic distribution.

Index


Code Index:

NAME

Top

Text::Phonetic::Phonem - Phonem algorithm

DESCRIPTION

Top

The PHONEM algorithm is a simple substitution algorithm that was originally implemented in dBase.

Implementation of the PHONEM substitutions, as described in Georg Wilde and Carsten Meyer, "Doppelgaenger gesucht - Ein Programm fuer kontextsensitive phonetische Textumwandlung" from ct Magazin fuer Computer & Technik 25/1999.

The original variant was implemented as X86-Assembler-Funktion. This implementation does not try to mimic the original code, though it should achieve equal results. As the original software used for building the original implementation was not available, there was no testing for correctness, other than the examples given in the article.

The Perl implementation was written by Martin Wilz (http://wilz.de/view/Themen/MagisterArbeit)

AUTHOR

Top

    Martin Wilz 
    http://wilz.de/view/Themen/MagisterArbeit

    Maroš Kollár
    CPAN ID: MAROS
    maros [at] k-1.com
    http://www.k-1.com

COPYRIGHT

Top

SEE ALSO

Top


Text-Phonetic documentation Contained in the Text-Phonetic distribution.

# ============================================================================
package Text::Phonetic::Phonem;
# ============================================================================
use utf8;

use Moose;
extends qw(Text::Phonetic);

__PACKAGE__->meta->make_immutable;

our $VERSION = $Text::Phonetic::VERSION;

our %DOUBLECHARS = (
    SC  =>'C', 
    SZ  =>'C', 
    CZ  =>'C', 
    TZ  =>'C', 
    SZ  =>'C', 
    TS  =>'C',
    KS  =>'X', 
    PF  =>'V', 
    QU  =>'KW', 
    PH  =>'V', 
    UE  =>'Y', 
    AE  =>'E',
    OE  =>'Ö', 
    EI  =>'AY', 
    EY  =>'AY', 
    EU  =>'OY', 
    AU  =>'A§', 
    OU  =>'§ '
);

sub _do_encode {
    my ($self,$string) = @_;
    
    $string = uc($string);
    $string =~ tr/A-Z//cd;

    # Iterate over two character substitutions
    foreach my $index (0..((length $string)-2)) {
        if ($DOUBLECHARS{substr $string,$index,2}) {
            substr ($string,$index,2) = $DOUBLECHARS{substr $string,$index,2};
        }
    }
    
    # Single character substitutions via tr
    $string =~tr/ZKGQIJFWPT§/CCCCYYVBDUA/;
    
    #delete forbidden characters
    $string =~tr/ABCDLMNORSUVWXY//cd;
    
    #remove double chars
    $string =~tr/ABCDLMNORSUVWXY//s;
    
    return $string;
}

1;