Lingua::Translate - Translate text from one language to another


Lingua-Translate documentation Contained in the Lingua-Translate distribution.

Index


Code Index:

NAME

Top

Lingua::Translate - Translate text from one language to another

SYNOPSIS

Top

 use Lingua::Translate;

 my $xl8r = Lingua::Translate->new(src => "en",
                                   dest => "de")
     or die "No translation server available for en -> de";

 my $english = "I would like some cigarettes and a box of matches";

 my $german = $xl8r->translate($english); # dies or croaks on error

 # prints "Mein Luftkissenfahrzeug ist voll von den Aalen";
 print $german;

DESCRIPTION

Top

Locale::Translate translates text from one written language to another. Currently this is implemented by contacting Babelfish (http://babelfish.altavista.com/), so see there for the language pairs that are supported. Babelfish uses SysTran (http://www.systran.org/) to perform the translation, and contacting a SysTran translation server directly is also supported (in case your translation needs grow beyond babelfish' capacity).

OVERVIEW

Top

To translate text, you first have to obtain a translation "handle" for the language pair (source language, destination language) that you are translating, using a constructor (see CONSTRUCTORS, below). This is returned as a perl object. You can then use this handle to translate arbitrary text, using the "translate" method (see METHODS, below).

Depending on the back-end that you are using, either the constructor or the translation will open a connection to a translation server. If there are any network errors or timeouts, then an exception will be thrown. If you want to check for this type of error, you will need to wrap both the constructor and the translation function in an eval { } block.

If you are using a systrans server, you will need to use the "config" function to tell this module where your translation server is running, and the port that it is listening on.

Translating is generally a heavily expensive task; you should try to save the results you get back from this module somewhere so that you do not overload Babelfish.

CONSTRUCTORS

Top

new(src => $lang, dest => $lang)

This function creates a new translation handle and returns it. It takes the following construction options, passed as Option => "value" pairs:

src

The source language, in RFC3066 form. See I18N::LangTags. There is no default.

dest

The destination language in the same form. There is no default.

Additionally, any configuration option that is normally passed to the "config" function (see below) may be passed to the "new" constructor as well.

METHODS

Top

translate($text) : $text

Translates $text and returns the translated text. die on any error.

CONFIGURATION FUNCTIONS

Top

This collection of functions configures general operation of the Lingua::Translate module, which is stored in package scoped variables.

These options only affect the construction of new objects, not the operation of existing objects.

load_back_end($backend)

This function loads the specified back-end. Used internally by config(). Returns the package name of the backend.

config(option => $value)

This function sets defaults for use when constructing new objects; it does not affect already constructed objects.

back_end

This specifies the method to use for translation. Currently supported values are "Babelfish" and "SysTran". The case is significant.

The default value is "Babelfish".

Setting this option will "use" the appropriate back-end module from Lingua::Translate::*, which should be a derived class of Lingua::Translate.

If the configuration option requested is not found, and a back-end is configured, then that back-end's config function is called with the options.

src_enc

Character set encoding assumed for input text. If the Unicode::MapUTF8 module is not available, this option has no effect and strings are passed on to the translation back end without processing.

The default value is "utf8".

dest_enc

Character set encoding assumed for returned text. If the Unicode::MapUTF8 module is not available, this option has no effect and strings are passed back from the translation back end as is.

The default value is "utf8".

This function can also be called as an instance method (ie $object->config(name => value), in which case it configures that object only.

BUGS/TODO

Top

No mechanism for backends registering which language pairs they have along with a priority, so that the most efficient back-end for a translation can be selected automatically.

Some much shorter invocation rules, suitable for one liners, etc.

I don't have access to a non-European character set version of SysTran, so translation to/from non-ISO-8859-1 character sets is not currently possible.

Need to formalise and define the "Interface" that the back-end modules adhere to.

SEE ALSO

Top

Lingua::Translate::Babelfish, LWP::UserAgent, Unicode::MapUTF8

The original interface to the fish - WWW::Babelfish, by Daniel J. Urist <durist@world.std.com>

AUTHOR

Top

Sam Vilain, <samv@cpan.org>


Lingua-Translate documentation Contained in the Lingua-Translate distribution.
#!/usr/bin/perl -w

package Lingua::Translate;

use strict;
use Carp;

use vars qw($VERSION $defaults);
$VERSION = '0.09';

# I'm not sure whether the "src", "dest" options should be hard coded
# like this.  Perhaps they should just be treated as configuration
# options.  But I think they deserve special treatment.

use I18N::LangTags qw(is_language_tag);

# HACK to do without Unicode::MapUTF8, as it can be very hard to
# install, particularly on older Perls
use vars qw($have_map_utf8);
BEGIN {
    eval "use Unicode::MapUTF8 qw(from_utf8 to_utf8 "
	."utf8_supported_charset);";
    $have_map_utf8 = 1 unless $@;
}

sub new {
    my $class = shift;
    my %options = @_;

    my $self = bless { }, $class;

    croak "Must supply source and destination language"
	unless (defined $options{src} and defined $options{dest});

    is_language_tag($self->{src} = delete $options{src})
	or croak "$self->{src} is not a valid RFC3066 language tag";

    is_language_tag($self->{dest} = delete $options{dest})
	or croak "$self->{dest} is not a valid RFC3066 language tag";

    # allow custom back end
    $self->{back_end} = load_back_end(delete $options{back_end}
				      || $defaults->{back_end});

    # allow encoding
    for my $option ( qw(src_enc dest_enc) ) {
	$self->config($option => delete $options{$option})
	    if (exists $options{$option});
    }

    $self->{back_end_options} = \%options if (keys %options);

    bless $self, $class;
}

# creates the $self->{worker} attribute

sub _create_worker {

    my $self = shift;

    # For flexibility, we allow two methods of configuration; if
    # the module defines &save_config(), then we call that
    # function, then call config(%options), new(), then
    # &restore_config().
    if ( $self->{back_end_options} and
	 my $code_ref = $self->{back_end}->can("save_config") ) {
	my $saved_config = $code_ref->();
	$self->{back_end}->config(%{$self->{back_end_options}});
	$self->{worker} =
	    $self->{back_end}->new( src => $self->{src},
				    dest => $self->{dest} );
	$self->{back_end}->restore_config($saved_config);
    } else {
	# If they don't define that function, then we just call
	# their &new() function with the remaining options as a
	# parameter
	$self->{worker} =
	    $self->{back_end}->new( src => $self->{src},
				    dest => $self->{dest},
				    %{$self->{back_end_options}||{}}
				  );
    }

}

sub translate {
    my ($self, $text) = map { shift } (1..2);

    my $source_encoding = $self->{src_enc} || $defaults->{src_enc};

    # BACK-ENDS MUST DEAL IN utf8
    if ( $have_map_utf8 and lc($source_encoding) ne "utf8" ) {
	$text = to_utf8( -string => $text,
			 -charset => $source_encoding);
    }

    # create a new worker if we need to
    $self->_create_worker unless $self->{worker};

    my $translated;
    for (1..3) {
        eval { $translated = $self->{worker}->translate($text) };
        last unless $@;
    }

    if ( $translated ) {
	my $dest_encoding = $self->{dest_enc} || $defaults->{dest_enc};
	if ( $have_map_utf8 and lc($dest_encoding) ne "utf8" ) {
	    $translated = from_utf8( -string => $translated,
				     -charset => $dest_encoding);
	}
	return $translated;
    } else {
	die "Translation back-end failed; $@";
    }
}

sub load_back_end {
    my ($back_end) = (@_);

    if ( $back_end !~ m/::/ ) {
	$back_end = "Lingua::Translate::$back_end";
    }
    eval "use $back_end;";
    if ( $@ ) {
	croak "Back end $back_end not available; $@";
    }

    return $back_end;
}

sub config {

    my ($self, $target);
    if ( UNIVERSAL::isa($_[0], __PACKAGE__) ) {
	$self = shift;
    } else {
	$self = $defaults ||= {};
    }

    while ( my ($option, $value) = splice @_, 0, 2 ) {

	if ( $option eq "back_end" ) {

	    # the user is selecting a back end.  Load it
	    $self->{back_end} = load_back_end($value);

	    # if there is a worker, kill it - it will be re-created on
	    # the next translate() call.
	    delete $self->{worker};

	} elsif ( $option eq "src_enc" ) {
	    # set the source encoding

	    croak "Charset `$value' not supported by Unicode::MapUTF8"
		if ($have_map_utf8 and
		    not utf8_supported_charset($value));

	    $self->{src_enc} = $value;

	} elsif ( $option eq "dest_enc" ) {

	    croak "Charset `$value' not supported by Unicode::MapUTF8"
		if ($have_map_utf8 and
		    not utf8_supported_charset($value));

	    $self->{dest_enc} = $value;

	} elsif (
		 my $code_ref = UNIVERSAL::can ($self->{back_end},
					       "config")
		) {

	    # call the back-end's configuration function
	    $code_ref->($option => $value);

	} else {
	    croak "Unknown configuration option $option";
	}
    }
}

# extract the default values from the POD
use Pod::Constants
    'CONFIGURATION FUNCTIONS' => sub {
	Pod::Constants::add_hook
		('*item' => sub {
		     my ($varname) = m/(\w+)/;
		     my ($default) = m/The default value is "(.*)"\./;
		     config($varname => $default);
		 }
		);
	Pod::Constants::add_hook
		(
		 '*back' => sub {
		     Pod::Constants::delete_hook('*item');
		     Pod::Constants::delete_hook('*back');
		 }
		);
    };

4;