| Locale-MakePhrase documentation | Contained in the Locale-MakePhrase distribution. |
Locale::MakePhrase::RuleManager - Language rule sort and evaluation object.
The Locale::MakePhrase module uses this plugin module, to implement the evaluation and sorting phases of text selection. It explains the rule expression syntax and evaluation procedure.
It sorts the language rules into a suitable order so that we can figure out which rule to select, ie. the aim is to sort the rules into an order so that we can select the first rule.
It evaluates the program arguments against the expressions in the rule.
To allow an argument to be placed within the middle of a string, we use square brackets as the notation, plus an underscore, then the argument number, as in:
"Please select [_1] files"
where [_1] refers to the first program variable supplied as an argument with the text to be translated.
To display square brackets within the text string, you will need to escape the square bracket by using the ~ (tilde) character, as in:
"This is ~[ bracketed text ~]"
this will print:
This is [ bracketed text ]
Of course, if you need to display the ~ character, you will need to use two of them, as in:
"Tilde needs escaping as in ~~"
which ends up printing:
Tilde needs escaping as in ~
We have coined the term linguistic rules as a means to describe the technique which decides which piece of text is displayed, for a given input text phrase and any/all program arguments.
To understand why we need to generate linguistic rules, consider the 'singular vs. plural' example shown in the REQUIREMENTS in Locale::MakePhrase section.
In this example, we needed four different text strings, for the trivial case of what to display for a given program value.
For other examples, the URL's mentioned in that section describe why there is a need for applying rules on a per-language basis (they also describe why most current language translation systems fail).
A linguistic rule is the evaluation of the context of a phrase by using program arguments, for a given program string. The arguments are evaluated left-to-right and top-to-bottom. The first rule to succeed has its corresponding translated text applied in-place of the input text.
Note that if a program string takes no arguments, the rule becomes rather simplistic (in that no arguments need to be evaluated).
Rules can be tested in a number of ways. The 'Operators' and 'Functions' sections list the available rule expression conjigates available for use within each rule expression.
Previously we mentioned that the language translation system used syntax with the form [_1]. You will notice that we use an underscore in the placeholder. This may appear to be meaningless, but as we will see, we use this rule property to help understand how rules are evaluated.
Let's show an example of a simple expression:
_1 == 0
The use of the underscore signifies that this value is to be classified as an argument number and is not to be treated literally. This expression says, 'Does the first argument have a value equal to zero?'
[Note that we use double-equals; the double-equals operator will use a numeric context in the equality test.]
Since an argument can also be a string, we could define an expression to be:
_1 eq "some text"
Notice that we use a different operator depending on whether the argument is numeric or a string. This is because we need to be able to figure out what context the argument needs to be evaluated in.
[In this case we use 'eq' as the text context equality operator.]
In some cases we need to be able to specify the translated string, based on an alternate representation of the argument. This is handled by using a function. For example, you may use the term 'houses', which is the main keyword within your application.
To handle alternations of the word 'houses' (such as 'house') we can define an expression of:
left(_1,5) eq 'house'
However, in some cases we will use the terms 'apartments' or 'flats'. In these cases, we only care if the value is in the plural or singular case:
right(_1,1) eq "s"
Thus, we are provided with a set of functions which allow some manipulation of the argument value, prior to evaluation.
In many cases, more than one argument is supplied (as well as the text to translate) to Locale::MakePhrase. In those cases, an expression can be created which tests each argument, as in:
_1 == 0 && _2 != 1
As we can see here, by using &&, we combine multiple expression evaluations, into a single rule expression. In this case the expression is effectively saying "if argument one is equal to zero AND argument two is not equal to one".
We support an unlimited number of arguments within the expression evaluation capability.
Consider the following exresssions:
_1 > 0 produces the output "Lots of files" _1 == 0 produces "No files" _1 == 1 produces "One file"
Each expression is a valid, but if we evaluate this set of expressions in the wrong order, we will never be able to produce the text "One file" as the _1 > 0 expression would evaluate to true, before we try to evaluate the _1 == 1 expression.
To counter this problem, whenever we define a rule expression (including when there is no rule expression as would be the case when no arguments are supplied), we must also define a rule priority (where a larger number gives higher priority).
Knowing this, let's re-consider the previous set of expressions, this time adding a suitable priority of evaluation for each expression:
expression: priority:
_1 > 1 1
_1 == 0 2
_1 == 1 2
Now that we have a rule priority, we can see that the _1 == 0 expression and the _1 == 1 expression will get evaluated before the _1 > 1 expression.
You will notice that two rules have the same priority (i.e. we can have any number of rules having the same priority); in this case, the rules are evaluated in a non-deterministic (first found, first evaluated) manner. Thus it is important to make sure that a given rule expression, has a valid rule priority, for the rule set.
Now that we know what a linguistic rule is, we need to explain some minor but important points.
Each symbol in a rule expression needs to be separated with a space, i.e. this works:
_1 > 4 left(_2,1) eq "f"
this doesn't:
_1>4 left(_2,1)eq"f"
Whenever we are using a string operator, we must enquote the value that we are testing, i.e. this works:
_1 eq "fred"
this doesn't:
_1 eq fred
We support single and double quote characters, including mixed quoting (for simplistic cases), i.e. these work:
_1 eq "some text" _1 eq 'some text' _1 eq "someone's dog" _1 eq '"john spoke"'
this doesn't (i.e. there is no quote 'escape' capability):
_1 eq "\"something\""
Note that expressions are not unary, as in (this checks if the first argument has any length):
length(_1)
rather, they should look like:
length(_1) > 0
The following description of rule evaluation is correct at the time of writing. However, as this module evolves, we may alter the implementation as we get feedback. If you have used this module and found that the rule evaluation order is not what you expect, please contact the maintainer.
So far we have discussed the concept that, a translation exists for a language/dialect combination. However, the application may not be translated into the specific language requested by the user. In these cases, Locale::MakePhrase tries to use fallback languages as the source language for this translation request. This allows languages derived from other base languages (eg Spanish and Mexican share common words) and dialect specific variations of languages (such as variations of English), to use the parent language as a source for possible translations.
Thus whenever a phrase cannot be translated directly into the requested localisation, Locale::MakePhrase will use a fallback mechanisn for the input phrase.
Also, to support variations in output text which can exist in locale-specific translations, non-expression rules should be evaluated after rules which have an expression.
The implementation of which rule to select, has been abstracted into a seperate module so that you can implement your own process of which rule is selected based on the available rules. The default implementation is defined in Locale::MakePhrase::RuleManager. It contains a description of the current implementation.
Locale::MakePhrase generates a list of rules which are applicable for required languages (plus any fallback languages).
The rules are sorted by the Locale::MakePhrase::RuleManager module.
Each expression from the sorted rules are evaluated. If the rule succeeds, the corresponding text is returned. If not, the next rule is evaluated.
If finally no match is found, the input string is used as the output string.
Any arguments are then applied to the output string.
Shown below are examples of various rules; some rules have no expressions and/or arguments; all rules must have at least a priority of zero or more.
Language: en_US Input text: Please select some colours. Expression: (none) Priority: 0 Output text: Please select some colors.
Language: en Input text: Please select some colours. Expression: (none) Priority: 0 Output text: Please select colours.
Language: en_AU Input text: Please select [_1] colours. Expression: (none) Priority: 0 Output text: Please select [_1] colours.
Language: en Input text: Please select [_1] colours. Expression: _1 > 0 Priority: 0 Output text: Select [_1] colours.
Given that the preferred language is 'en_US', if you compare rule 1 vs rule 2, the linguistic rule evaluation mechanism will be applied to rule 1 before being applied to rule 2, as it has a higher language-order.
Compare rule 3 vs rule 4. Given that there is no expression associated with rule 3, but that the 'en' version does have an expression, rule 4 will be evaluated (and found to be true in some cases) before example 3 is evaluated.
These examples show that it is important to consider the interactions of the linguistic rules, as they are applied to the current localisation.
With any text translation system, there comes a time when it is necessary to apply the values of the arguments 'in situ', replacing the square-bracket argument number, with the corresponding argument value, so that the output will say something useful. This happens after all rules have been applied (if there were any), and after the output text string has been chosen.
For example:
Input text: "Selected [_2] files, [_1] directories" Arguments: 3 21
Apply rules...
Rule text: "Selected [_2] files, [_1] directories" Output text: "Selected 21 files, 3 directories"
This is a list of all operators:
Operator Context Meaning Example ---------------------------------------------------------------------- == Numeric Equal to _1 == 4 != Numeric Not equal to _1 != 2 > Numeric Greater than _2 > 1 < Numeric Less than _1 < 7 >= Numeric Less than or equal to _4 >= 21 <= Numeric Greater than or equal to _3 <= 12 eq String Equal to _1 eq "some text" ne String Not equal to _2 ne "something else"
This is a list of available functions:
Function Context Meaning Example
----------------------------------------------------------------------
defined(x) - Is the argument defined/not-null, defined(_1)
returns 0 or 1
length(x) - Length of value of the argument, length(_1)
returns an integer >= 0
abs(n) Number Numerical absolute of argument abs(_3)
lc(s) String Lowercase version lc(_1)
uc(s) String Uppercase version uc(_2)
left(s,n) String LHS of argument from start left(_3,4)
right(s,n) String RHS of argument from end right(_1,2)
substr(s,n) String RHS of argument from start substr(_2,7)
substr(s,n,l) String Sub-part of argument from 'n', substr(_2,7,4)
up to 'l' characters
The following functions are used by the Locale::MakePhrase class. By sub-classing this module, then overloading these functions, Locale::MakePhrase can use yor custom RuleManager module.
Construct a new instance of Locale::MakePhrase::RuleManager object; arguments are passed to the init() method.
Allow sub-class a chance to control construction of the object. You must return a reference to $self, to 'allow' the construction to complete (should you decide to derive from it).
This is the expression evaluation engine. It takes an expression as described above (for example _1 == 4 && _2 eq 'fred'). It then takes any program arguments, applying them in-place of the _X place holders. Finally returning true / false, based on the result of the evaluation of the expression.
The guts of the sorter; by subclassing this module, you can implement your own sorting routine.
This module implements the following rules for deciding the sorted order of the rules. The aim is to return a list which can be evaluated in-order.
Rules are sorted histest to lowest priority, for the primary language, for rules which have expressions.
The next available fallback language is chosen as the language to use; step 1 is repeated.
This process continues until no further fallback languages are available.
The non-expression rules are then evaluated according to the preferred language.
If that fails, the fallback languages are tried. This continues for each fallback language.
This applies any/all arguments, to the outgoing text phrase; if the
argument is text, it (optionally) undergoes the translation process;
if the argument is numeric, it is formatted by the Locale::MakePhrase
format_number method.
| Locale-MakePhrase documentation | Contained in the Locale-MakePhrase distribution. |
package Locale::MakePhrase::RuleManager; our $VERSION = 0.4; our $DEBUG = 0;
use strict; use warnings; use utf8; use base qw(); use Data::Dumper; use Locale::MakePhrase::Utils qw(is_number left right alltrim die_from_caller); local $Data::Dumper::Indent = 1 if $DEBUG; use constant STR_INVALID_TRANSLATED => "<INVALID_TRANSLATION>"; # Available datatypes use constant UNKNOWN => -1; use constant UNSPECIFIED => 0; use constant NUMBER => 1; use constant STRING => 2; # Available operators our %OPERATORS = ( '==' => NUMBER, '!=' => NUMBER, '<' => NUMBER, '>' => NUMBER, '<=' => NUMBER, '>=' => NUMBER, 'eq' => STRING, 'ne' => STRING, ); # Available functions our %FUNCTIONS = ( 'defined' => [ UNSPECIFIED, [], sub { defined($_[0]); } ], 'length' => [ UNSPECIFIED, [], sub { length($_[0]); } ], 'abs' => [ NUMBER, [], sub { abs($_[0]); } ], 'lc' => [ STRING, [], sub { lc($_[0]); } ], 'uc' => [ STRING, [], sub { uc($_[0]); } ], 'left' => [ STRING, [1], sub { left($_[0],$_[1]); } ], 'right' => [ STRING, [1], sub { right($_[0],$_[1]); } ], 'substr' => [ STRING, [1,2], sub { return substr($_[0], $_[1]) if @_ == 2; return substr($_[0], $_[1], $_[2]); } ], ); # Predefined regular expression patterns my $ops_re; { my $tmp = join ('|', keys %OPERATORS); $ops_re = qr/$tmp/; } my $ltr_re = qr/^_(\d+)\s+($ops_re)\s+(.*)/; my $rtl_re = qr/(.*)\s+($ops_re)\s+_(\d+)$/; my $func_ltr_re = qr/^([a-zA-Z0-9_]+)\(_(\d+)([^)]*)\)\s+($ops_re)\s+(.*)$/; my $func_rtl_re = qr/^(.*)\s+($ops_re)\s+([a-zA-Z0-9_]+)\(_(\d+)([^)]*)\)$/; #--------------------------------------------------------------------------
# Constructor sub new { my $proto = shift; my $class = ref($proto) || $proto; my $self = bless {}, $class; return $self->init(@_); } #--------------------------------------------------------------------------
sub init { shift } #--------------------------------------------------------------------------
sub evaluate { my $self = shift; die("Missing rule expression?!") unless @_; my $expression = alltrim(shift); print STDERR "Evaluating rule: $expression\n" if $DEBUG > 1; print STDERR "Arguments: ". Dumper(\@_) if $DEBUG > 5; my $evaluation = 0; my @expressions; my $arg_count = scalar(@_); # Break apart expression into subexpressions, so that it can be validated my @subexpressions; foreach my $chunk (split('\s+&&\s+',$expression)) { my ($arg,$val,$op,$text,$quote,$func,$func_args) = ("","","","","","",""); my ($val_type,$op_type,$text_type,$func_type) = (UNKNOWN,UNKNOWN,UNKNOWN,UNKNOWN); # Break apart subexpression if ($chunk =~ $ltr_re) { ($arg,$op,$text) = ($1,$2,$3); } elsif ($chunk =~ $rtl_re) { ($text,$op,$arg) = ($1,$2,$3); } elsif ($chunk =~ $func_ltr_re) { ($func,$arg,$func_args,$op,$text) = ($1,$2,$3,$4,$5); } elsif ($chunk =~ $func_rtl_re) { ($text,$op,$func,$arg,$func_args) = ($1,$2,$3,$4,$5); } else { die_from_caller("Invalid subexpression ($chunk) found in expression:",$expression); } # Grab properties for this subexpression - test for conformity die_from_caller("Missing argument ?!") unless $arg; die_from_caller("Missing operator ?!") unless $op; die_from_caller("Missing text value ?!") unless length $text; die_from_caller("Unknown operator: $op") unless exists $OPERATORS{$op}; die_from_caller("Incorrect number of arguments in expression (trying to use $arg arguments when $arg_count arguments supplied):",$expression) if ($arg > $arg_count); if ($text =~ /^(['"])(.*)(["'])$/) { $quote = $1 if ($1 eq $3); $text = defined $2 ? $2 : ""; } $val = $_[$arg-1]; $op_type = $OPERATORS{$op}; $val_type = not defined $val ? UNSPECIFIED : is_number($val) ? NUMBER : STRING; $text_type = is_number($text) ? NUMBER : STRING; die_from_caller("Missing quote-marks for text value: $text") if ($op_type == STRING and not length $quote); die_from_caller("Mis-matched text-value/operator types\n- text-value: $text \n- operator: $op") if ($op_type == NUMBER and $text_type != NUMBER); print STDERR "Rule properties:\n- argument: $arg\n- function: $func\n- operator: $op\n- text: $text\n- quote: $quote\n- value: ", defined $val ? $val : "", "\n" if $DEBUG > 1; # build expression if ($func) { die_from_caller("Unknown function: $func") unless exists $FUNCTIONS{$func}; $func_type = $FUNCTIONS{$func}->[0]; die_from_caller("Mis-matched function/operator types\n- function: $func\n- operator: $op") if ($func_type != UNSPECIFIED and $func_type != $op_type); die_from_caller("Invalid use of undefined argument (argument number: $arg) when used with function: $func") if ($val_type == UNSPECIFIED and $func_type != UNSPECIFIED); my $required = $FUNCTIONS{$func}->[1]; my $sub = $FUNCTIONS{$func}->[2]; if (@$required) { die_from_caller("Incorrect number of arguments for function: $func (need: ". join(',',@$required) ." - none provided)") unless length $func_args; $func_args = alltrim($func_args); my @func_args = split(/\s*,\s*/,$func_args); shift @func_args if @func_args > 1; my $found = 0; foreach my $required_arg_count (@$required) { $found++ if @func_args == $required_arg_count; } die_from_caller("Incorrect number of arguments for function: $func (need: ". join(',',@$required) ." - provided: ".scalar(@func_args).")") unless $found; $val = &$sub($val,@func_args); } else { $val = &$sub($val); } $val = 0 unless defined $val; $val_type = $op_type; print STDERR "Function result: ", defined $val ? $val : "", "\n" if $DEBUG > 5; } else { die_from_caller("Invalid use of undefined argument (argument number: $arg) expression: $expression") unless (defined $val); } print STDERR "- op_type $op_type\n- val_type $val_type\n- text_type $text_type\n" if $DEBUG > 3; die_from_caller("Mis-matched argument/operator types\n- argument: $val\n- operator: $op") if (($op_type == STRING and $val_type != STRING) or ($op_type == NUMBER and $val_type != NUMBER)); push @subexpressions, "$quote$val$quote $op $quote$text$quote"; } die("Failed building expression") unless @subexpressions; my $parsed_expression = join(' && ',@subexpressions); # Evaluate expression - needs to return some sort or true or false value. # Note that under Perl, this is as simple as an 'eval' :-) my $expr_result = eval $parsed_expression; $expr_result = $expr_result ? 1 : 0; print STDERR "Expression to evaluate: $parsed_expression result: $expr_result\n" if $DEBUG; return $expr_result; } #--------------------------------------------------------------------------
sub sort { my ($self,$rule_objs,$languages) = @_; my @new_order; my @non_ruled; foreach my $language (@$languages) { my @r; my @nr; foreach my $r_obj (@$rule_objs) { next unless $r_obj->language eq $language; if ($r_obj->expression) { push @r, $r_obj; } else { push @nr, $r_obj; } } @r = sort { $b->priority <=> $a->priority } @r; push @new_order, @r; @nr = sort { $b->priority <=> $a->priority } @nr; push @non_ruled, @nr; } push @new_order, @non_ruled; return \@new_order; } #--------------------------------------------------------------------------
sub apply_arguments { my ($self,$makephrase,$translation,@args) = @_; print "HERE:".Dumper(@args) if $DEBUG; my $arg_count = scalar(@args); my $output = ""; my $in_group = 0; WHILE_LOOP: while($translation =~ /\G( [^\~\[\]]+ # non-~[] stuff | ~[\[\]\~] # ~[, ~], ~~ | \[ # [ presumably opening a group | \] # ] presumably closing a group | .* # any other characters | $ )/xgs ){ my $chunk = defined $1 ? $1 : ""; if ($chunk eq '[') { die_from_caller("Found recursive beginning square-bracket:",$translation) if $in_group; $in_group++; } elsif ($chunk eq ']') { $in_group--; die_from_caller("Found recursive ending square-bracket:",$translation) if $in_group; } elsif ($in_group) { # inside square bracket group $chunk = alltrim($chunk); if ($chunk =~ /^_/) { my ($idx,$options) = split(/\s*,\s*/,$chunk,2); $options = { split(/\s*(?:(?:=>)|(?:,))\s*/,$options) } if $options; $idx =~ s/^_//; $idx = int($idx)-1; if ($idx >= $arg_count) { die_from_caller("Incorrect number of arguments used in translation - supplied $arg_count, tried to use at least $idx.") if ($makephrase->{die_on_bad_translation}); $output .= STR_INVALID_TRANSLATED; last WHILE_LOOP; } else { my $val = $args[$idx]; if (defined $val and length $val) { if (is_number($val)) { $output .= $makephrase->format_number($val,$options); } elsif ($makephrase->{translate_arguments}) { $output .= $makephrase->translate($val); } else { $output .= $val; } } } } else { if ($makephrase->{die_on_bad_translation}) { die_from_caller("Invalid translated string: $translation"); } $output .= STR_INVALID_TRANSLATED; last WHILE_LOOP; } } elsif (substr($chunk,0,1) eq '~') { $output .= substr($chunk,1); } else { $output .= $chunk; } } return $output; } 1; __END__ #-------------------------------------------------------------------------- =cut