Encode::Detect::CJK - A Charset Detector, optimized for EastAsia charset and website content


Encode-Detect-CJK documentation  | view source Contained in the Encode-Detect-CJK distribution.

Index


NAME

Top

Encode::Detect::CJK - A Charset Detector, optimized for EastAsia charset and website content

SYNOPSIS

Top

	use Encode::Detect::CJK; #just use

	use Encode::Detect::CJK qw(detect); #use and export function 

	#simple use it
	my $charset=CharsetDetector::detect($octets);

	#use it with advanced option
	my $charset = CharsetDetector::detect($octets,$max_len,$is_consider_html_head_charset);
	#return the charset of binary string $octets
	#$max_len if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len)
	#$is_consider_html_header_charset, by DEFAULT, detetor will consider 
	#	html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, 
	#	if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0

Basic Function

Top

detect - detect the charset of string

	$charset=CharsetDetector::detect($octets,$max_len,$is_consider_html_head_charset);
	$charset=CharsetDetector::detect($octets,$max_len);#CharsetDetector::detect($octets,$max_len,1);
	$charset=CharsetDetector::detect($octets);#same as CharsetDetector::detect($octets,undef);

Param $octets - input binary string

input binary string

Param $max_len - max length for charset detector

if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len) DEFAULT is unlimit

Param $is_consider_html_head_charset

by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0

Return Value $charset

if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name

Supported Charset List

Top

	return value: alias

	ascii       : ascii
	iso-8859-1  : iso-8859-1
	utf8        : utf8 utf-8-strict
	utf16       : utf16
	cp936       : euc-cn(gb2312) cp936(gbk) gb18030
	big5-eten   : big5-eten
	euc-jp      : euc-jp
	shiftjis    : shiftjis
	iso-2022-jp : iso-2022-jp
	euc-kr      : euc-kr
	iso-2022-kr : iso-2022-kr

COPYRIGHT

Top


Encode-Detect-CJK documentation  | view source Contained in the Encode-Detect-CJK distribution.