Text::Unaccent::PurePerl - remove accents from characters


Text-Unaccent-PurePerl documentation  | view source Contained in the Text-Unaccent-PurePerl distribution.

Index


NAME

Top

Text::Unaccent::PurePerl - remove accents from characters

SYNOPSIS

Top

  use Text::Unaccent::PurePerl qw(unac_string);

  $unaccented = unac_string($string);

  # For compatibility with Text::Unaccent, and
  # for dealing with strings of raw octets:

  $unaccented = unac_string($charset, $octets);
  $unaccented = unac_string_utf16($octets);

  # For compatibility with Text::Unaccent, but
  # have no useful purpose in this module.
  $version = unac_version();
  unac_debug($level);

DESCRIPTION

Top

Text::Unaccent::PurePerl is a module for removing accents from a string. It is essentially a pure Perl equivalent to the Text::Unaccent module, but this one also properly handles character strings, whereas Text::Unaccent only deals with raw octet strings with an associated character coding. In addition, this module, as the name suggests, does not require a C compiler to build. The disadvantage is that this module is much slower than the compiled Text::Unaccent.

EXPORT

Top

Functions exported by default: unac_string, unac_string_utf16, unac_version, and unac_debug.

FUNCTIONS

Top

unac_string CHARACTER_STRING
unac_string ENCODING, OCTET_STRING

Return the unaccented equivalent to the input string. The one-argument version assumes the input is a Perl string, i.e., a sequence of characters. (A character is in the range 0...(2**32-1), or more).

The two-argument version assumes the input is a sequence of octets, i.e., raw, encoded data. (An octet is eight bits of data with ordinal value in the range 0...255.) It is essentially equivalent to the following unaccent() function

  use Text::Unaccent;
  use Encode;

  sub unaccent {
      ($enc, $oct) = @_;
      encode($enc, unac_string(decode($enc, $oct)));
  }

except that this module's unac_string() doesn't require the Encode module nor the C compiler required to compile Text::Unaccent.

unac_string_utf16 OCTET_STRING

This function is mainly provided for compatibility with Text::Unaccent. It is equivalent to

    unac_string("UTF-16BE", OCTET_STRING);

unac_version

This function is provided only for compatibility with Text::Unaccent. It returns the version of this module.

unac_debug LEVEL

This function is provided only for compatibility with Text::Unaccent. It has no effect on the behaviour of this module.

EXAMPLES

Top

French

  $str1 = "déjà vu";
  $str2 = unac_string($str1);
  #     = "deja vu";

Greek

  $str1 = "νέα";
        = "\x{03AD}\x{03BD}\x{03B1}";

  $str2 = unac_string($str1);
  #     = "νεα";
  #     = "\x{03B5}\x{03BD}\x{03B1}"

The unaccented string $str2 is made up by the three letters epsilon (without the tonos), nu, and alpha.

In contrast, the version of unac_string() in the Text::Unaccent module gives

  $oct2 = unac_string("UTF-8", $str1);
  #     = "\xCE\xB5\xCE\xBD\xCE\xB1"

These octets are the UTF-8 encoded equivalent of "\x{03B5}\x{03BD}\x{03B1}".

BUGS

Top

There are currently no known bugs.

Please report any bugs or feature requests to bug-text-unaccent-pureperl at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Unaccent-PurePerl. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc Text::Unaccent::PurePerl

You can also look for information at:

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Text-Unaccent-PurePerl

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Text-Unaccent-PurePerl

* CPAN Ratings

http://cpanratings.perl.org/d/Text-Unaccent-PurePerl

* Search CPAN

http://search.cpan.org/dist/Text-Unaccent-PurePerl

SEE ALSO

Top

Text::Unaccent(3).

AUTHOR

Top

Peter J. Acklam, <pjacklam@cpan.org>

COPYRIGHT & LICENSE

Top


Text-Unaccent-PurePerl documentation  | view source Contained in the Text-Unaccent-PurePerl distribution.