NAME

HTML::Entities::ImodePictogram - encode / decode i-mode pictogram

SYNOPSIS

use HTML::Entities::ImodePictogram;

      $html      = encode_pictogram($rawtext);
      $rawtext   = decode_pictogram($html);
      $cleantext = remove_pictogram($rawtext);

      use HTML::Entities::ImodePictogram qw(find_pictogram);

      $num_found = find_pictogram($rawtext, \&callback);

DESCRIPTION

HTML::Entities::ImodePictogram handles HTML entities for i-mode pictogram (emoji), which are assigned in Shift_JIS private area.

See http://www.nttdocomo.co.jp/i/tag/emoji/index.html for details about i-mode pictogram.

FUNCTIONS

In all functions in this module, input/output strings are asssumed as encoded in Shift_JIS. See the Jcode manpage for conversion between Shift_JIS and other encodings like EUC-JP or UTF-8.

This module exports following functions by default.

encode_pictogram

          $html = encode_pictogram($rawtext);
          $html = encode_pictogram($rawtext, unicode => 1);

        Encodes pictogram characters in raw-text into HTML entities. If
        $rawtext contains extended pictograms, they are encoded in Unicode
        format. If you add "unicode" option explicitly, all pictogram
        characters are encoded in Unicode format ("&xFFFF;"). Otherwise,
        encoding is done in decimal format ("&NNNNN;").

decode_pictogram

$rawtext = decode_pictogram($html);

        Decodes HTML entities (both for "&xFFFF;" and "&NNNNN;") for
        pictogram into raw-text in Shift_JIS.

remove_pictogram

$cleantext = remove_pictogram($rawtext);

Removes pictogram characters in raw-text.

This module also exports following functions on demand.

find_pictogram

$num_found = find_pictorgram($rawtext, \&callback);

        Finds pictogram characters in raw-text and executes callback when
        found. It returns the total numbers of charcters found in text.

        The callback is given three arguments. The first is a found
        pictogram character itself, and the second is a decimal number which
        represents Shift_JIS codepoint of the character. The third is a
        Unicode codepoint. Whatever the callback returns will replace the
        original text.

        Here is a stub implementation of encode_pictogram(), which will be
        the good example for the usage of find_pictogram(). Note that this
        example version doesn't support extended pictograms.

          sub encode_pictogram {
              my $text = shift;
              find_pictogram($text, sub {
                                 my($char, $number, $cp) = @_;
                                 return '&#' . $number . ';';
                             });
              return $text;
          }

CAVEAT

AUTHOR

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

the HTML::Entities manpage, the Unicode::Japanese manpage, http://www.nttdocomo.co.jp/p_s/imode/tag/emoji/