| CharsetDetector documentation | view source | Contained in the CharsetDetector distribution. |
CharsetDetector - A Charset Detector, optimized for EastAsia charset and website content
use CharsetDetector; use CharsetDetector qw(detect detect1); #simple use it $charset = CharsetDetector::detect($octets); #with length limit $charset = CharsetDetector::detect($octets,$max_len); #don't consider html head charset as a factor to detect charset $charset = CharsetDetector::detect1($octets); $charset = CharsetDetector::detect1($octets,$max_len);
$charset = CharsetDetector::detect($octets); $charset = CharsetDetector::detect($octets,$max_len);
detect charset don't consider html head charset as a factor to detect charset by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, use detect1 instead of detect
$charset = CharsetDetector::detect1($octets); $charset = CharsetDetector::detect1($octets,$max_len);
if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name
return value: alias ascii : ascii iso-8859-1 : iso-8859-1 utf8 : utf8 utf-8-strict utf16 : utf16 cp936 : euc-cn(gb2312) cp936(gbk) gb18030 big5-eten : big5-eten euc-jp : euc-jp shiftjis : shiftjis iso-2022-jp : iso-2022-jp euc-kr : euc-kr iso-2022-kr : iso-2022-kr
The CharsetDetector module is Copyright (c) 2003-2006 QIAN YU. All rights reserved.
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.
| CharsetDetector documentation | view source | Contained in the CharsetDetector distribution. |