WWW::Translate::Apertium - Open source machine translation


WWW-Translate-Apertium documentation  | view source Contained in the WWW-Translate-Apertium distribution.

Index


NAME

Top

WWW::Translate::Apertium - Open source machine translation

VERSION

Top

Version 0.16 September 6, 2010

SYNOPSIS

Top

    use WWW::Translate::Apertium;

    my $engine = WWW::Translate::Apertium->new();

    my $translated_string = $engine->translate($string);

    # default language pair is Catalan -> Spanish
    # change to Spanish -> Galician:
    $engine->from_into('es-gl');

    # check current language pair:
    my $current_langpair = $engine->from_into();

    # get available language pairs:
    my %pairs = $engine->get_pairs();

    # default output format is 'plain_text'
    # change to 'marked_text':
    $engine->output_format('marked_text');

    # check current output format:
    my $current_format = $engine->output_format();

    # configure a new Apertium object to store unknown words:
    my $engine = WWW::Translate::Apertium->new(
                                                output => 'marked_text',
                                                store_unknown => 1,
                                              );

    # get unknown words for source language = Aranese
    my $es_unknown_href = $engine->get_unknown('oc_aran');

DESCRIPTION

Top

Apertium is an open source shallow-transfer machine translation engine designed to translate between related languages (and less related languages). It is being developed by the Department of Software and Computing Systems at the University of Alicante. The linguistic data is being developed by research teams from the University of Alicante, the University of Vigo and the Pompeu Fabra University. For more details, see http://www.apertium.org/.

WWW::Translate::Apertium provides an object oriented interface to the Apertium online machine translation web service, based on Apertium 3.0.

Currently, Apertium supports the following language pairs:

- Bidirectional

* Aranese < > Catalan
* Bulgarian < > Macedonian
* Catalan < > English
* Catalan < > French
* Catalan < > Occitan
* Catalan < > Portuguese
* Catalan < > Spanish
* French < > Spanish
* English < > Galician
* English < > Spanish
* English < > Esperanto
* Galician < > Portuguese
* Galician < > Spanish
* Norwegian Bokmål < > Norwegian Nynorsk
* Occitan < > Spanish
* Portuguese < > Spanish

- Single Direction

* Basque > Spanish
* Breton > French
* Catalan > Esperanto
* Icelandic > English
* Romanian > Spanish
* Spanish > Asturian
* Spanish > Brazilian Portuguese
* Spanish > Catalan (Valencian)
* Spanish > Esperanto
* Swedish > Danish
* Welsh > English

CONSTRUCTOR

Top

new()

Creates and returns a new WWW::Translate::Apertium object.

    my $engine = WWW::Translate::Apertium->new();

WWW::Translate::Apertium recognizes the following parameters:

* lang_pair

You can find below the valid values of this parameter, classified by source language:

Aranese into:

* Catalan -- oc_aran-ca

Basque into:

* Spanish -- eu-es

Breton into:

* French --br-fr

Bulgarian into:

* Macedonian --bg-mk

Catalan into:

* Aranese -- ca-oc_aran
* English -- ca-en
* Esperanto -- ca-eo
* French -- ca-fr
* Occitan -- ca-oc
* Spanish -- ca-es

English into:

* Catalan -- en-ca
* Esperanto -- en-eo
* Galician -- en-gl
* Spanish -- en-es

Esperanto into:

* English -- eo-en

French into:

* Catalan -- fr-ca
* Spanish -- fr-es

Galician into:

* English -- gl-en
* Spanish -- gl-es

Icelandic into:

* English -- is-en

Macedonian into:

* Bulgarian --mk-bg

Norwegian Bokmål into:

* Norwegian Nynorsk -- nb-nn

Norwegian Nynorsk into:

* Norwegian Bokmål -- nn-nb

Occitan into:

* Catalan -- oc-ca
* Spanish -- oc-es

Portuguese into:

* Catalan -- pt-ca
* Galician -- pt-gl
* Spanish -- pt-es

Romanian into:

* Spanish -- ro-es

Spanish into:

* Asturian -- es-ast
* Brazilian Portuguese -- es-pt_BR
* Catalan -- es-ca
* English -- es-en
* Esperanto -- es-eo
* French -- es-fr
* Galician -- es-gl
* Portuguese -- es-pt

Swedish into:

* Danish -- sv-da

Welsh into:

* English -- cy-en

These language pairs are stable versions. Other language pairs are currently under development.

* output

The valid values of this parameter are:

* plain_text

Returns the translation as plain text (default value).

* marked_text

Returns the translation with the unknown words marked with an asterisk.

Warning: This feature is always on in the current version of the Catalan < > French language pair due to a bug in the stable package for these languages. It will be fixed in the next release.

* store_unknown

Off by default. If set to a true value, it configures the engine object to store in a hash the unknown words and their frequencies during the session. You will be able to access this hash later through the get_unknown method. If you change the engine language pair in the same session, it will also create a separate word list for the new source language.

IMPORTANT: If you activate this setting, then you must also set the output parameter to marked_text. Otherwise, the get_unknown method will return an empty hash.

The default parameter values can be overridden when creating a new Apertium engine object:

    my %options = (
                    lang_pair => 'es-ca',
                    output => 'marked_text',
                    store_unknown => 1,
                  );

    my $engine = WWW::Translate::Apertium->new(%options);

METHODS

Top

$engine->translate($string)

Returns the translation of $string generated by Apertium, encoded as UTF-8. In case the server is down, the translate method will show a warning and return undef.

The input $string must be an UTF-8 encoded string (for this task you can use the Encode module or the PerlIO layer, if you are reading the text from a file).

If you are going to translate a string literal included in the code and then display the result in the output window of the code editor, then you should add the following statement to your code in order to avoid a "Wide character in print" warning:

    binmode(STDOUT, ':utf8');




$engine->from_into($lang_pair)

Changes the engine language pair to $lang_pair. When called with no argument, it returns the value of the current engine language pair.

$engine->get_pairs()

Returns a hash containing the available language pairs. The hash keys are the language codes, and the values are the corresponding language names.

$engine->output_format($format)

Changes the engine output format to $format. When called with no argument, it returns the value of the current engine output format.

$engine->get_unknown($lang_code)

If the engine was configured to store unknown words, it returns a reference to a hash containing the unknown words (keys) detected during the current machine translation session for the specified source language, along with their frequencies (values).

The valid values of $lang_code for the source language are (in alphabetical order):

* bg -- Bulgarian
* br -- Breton
* ca -- Catalan
* cy -- Welsh
* en -- English
* eo -- Esperanto
* es -- Spanish
* eu -- Basque
* fr -- French
* gl -- Galician
* is -- Icelandic
* mk -- Macedonian
* nb -- Norwegian Bokmål
* nn -- Norwegian Nynorsk
* oc -- Occitan
* oc_aran -- Aranese
* pt -- Portuguese
* ro -- Romanian
* sv -- Swedish

DEPENDENCIES

Top

LWP::UserAgent

URI::Escape

HTML::Entities

SEE ALSO

Top

WWW::Translate::interNOSTRUM

REFERENCES

Top

Apertium project website:

http://www.apertium.org/

If you want to get the real thing, you can download the Apertium code and build it on your local machine. You will find detailed setup instructions in the Apertium wiki:

http://wiki.apertium.org/wiki/Installation

ACKNOWLEDGEMENTS

Top

Many thanks to Mikel Forcada Zubizarreta, coordinator of the Transducens research team of the Department of Software and Computing Systems at the University of Alicante, who kindly answered my questions during the development of this module, and to Xavier Noria, João Albuquerque, and Kevin Brubeck Unhammer for useful suggestions. The author is also grateful to Francis Tyers, a member of the Apertium team who provided essential feedback for the latest versions of this module.

AUTHOR

Top

Enrique Nell, <blas.gordon at gmail.com>

BUGS

Top

Please report any bugs or feature requests to bug-www-translate-apertium at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=WWW-Translate-Apertium. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc WWW::Translate::Apertium




You can also look for information at:

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=WWW-Translate-Apertium

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/WWW-Translate-Apertium

* CPAN Ratings

http://cpanratings.perl.org/d/WWW-Translate-Apertium

* Search CPAN

http://search.cpan.org/dist/WWW-Translate-Apertium/

COPYRIGHT AND LICENSE

Top


WWW-Translate-Apertium documentation  | view source Contained in the WWW-Translate-Apertium distribution.