| Lingua-AR-MacArabic documentation | Contained in the Lingua-AR-MacArabic distribution. |
Lingua::AR::MacArabic - transcoding between Mac OS Arabic encoding and Unicode
(1) using function names exported by default:
use Lingua::AR::MacArabic;
$wchar = decodeMacArabic($octet);
$octet = encodeMacArabic($wchar);
(2) using function names exported on request:
use Lingua::AR::MacArabic qw(decode encode);
$wchar = decode($octet);
$octet = encode($wchar);
(3) using function names fully qualified:
use Lingua::AR::MacArabic ();
$wchar = Lingua::AR::MacArabic::decode($octet);
$octet = Lingua::AR::MacArabic::encode($wchar);
# $wchar : a string in Perl's Unicode format
# $octet : a string in Mac OS Arabic encoding
This module provides decoding from/encoding to Mac OS Arabic encoding (denoted MacArabic hereafter).
Functions provided here should cope with Unicode accompanied
with some directional formatting codes: i.e.
PDF (or U+202C), LRO (or U+202D), and RLO (or U+202E).
Arabic-Indic Digits and some related characters in Unicode
are encoded in MacArabic as if normal digits (U+0030..U+0039)
when they appear in the left-to-right direction.
$wchar = decode($octet)$wchar = decodeMacArabic($octet)Converts MacArabic to Unicode.
decodeMacArabic() is an alias for decode() exported by default.
$octet = encode($wchar)$octet = encode($handler, $wchar)$octet = encodeMacArabic($wchar)$octet = encodeMacArabic($handler, $wchar)Converts Unicode to MacArabic.
encodeMacArabic() is an alias for encode() exported by default.
If the $handler is not specified,
any character that is not mapped to MacArabic is deleted;
if the $handler is a code reference,
a string returned from that coderef is inserted there.
if the $handler is a scalar reference,
a string (a PV) in that reference (the referent) is inserted there.
The 1st argument for the $handler coderef is
the Unicode code point (integer) of the unmapped character.
E.g.
sub hexNCR { sprintf("&#x%x;", shift) } # hexadecimal NCR
sub decNCR { sprintf("&#%d;" , shift) } # decimal NCR
print encodeMacArabic("ABC\x{100}\x{10000}");
# "ABC"
print encodeMacArabic(\"", "ABC\x{100}\x{10000}");
# "ABC"
print encodeMacArabic(\"?", "ABC\x{100}\x{10000}");
# "ABC??"
print encodeMacArabic(\&hexNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"
print encodeMacArabic(\&decNCR, "ABC\x{100}\x{10000}");
# "ABCĀ𐀀"
Sorry, the author is not working on a Mac OS. Please let him know if you find something wrong.
Maybe bug?: The (default) paragraph direction is not resolved.
Does Mac always surround by LRO..PDF or RLO..PDF
the characters with bidirectional type to be overridden?
SADAHIRO Tomoyuki <SADAHIRO@cpan.org>
Copyright(C) 2003-2011, SADAHIRO Tomoyuki. Japan. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ARABIC.TXT
http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/CORPCHAR.TXT
| Lingua-AR-MacArabic documentation | Contained in the Lingua-AR-MacArabic distribution. |
package Lingua::AR::MacArabic; require 5.006001; use strict; require Exporter; require DynaLoader; our $VERSION = '0.10'; our @ISA = qw(Exporter DynaLoader); our @EXPORT = qw(decodeMacArabic encodeMacArabic); our @EXPORT_OK = qw(decode encode); bootstrap Lingua::AR::MacArabic $VERSION; 1; __END__