
752
|
Appendix C: Perl Code Examples
CJKV Encoding Templates
e following are some encoding specications that can be used for handling various
CJKV encodings. In particular, they are useful in conjunction with automatically detect-
ing CJKV encodings.
EUC-CN and EUC-KR Encodings
$euc = q{
[\x00-\x7F] # Code set 0 (ASCII or equivalent)
| [\xA1-\xFE][\xA1-\xFE] # Code set 1 (GB 2312-80 or KS X 1001:2004)
};
EUC-TW Encoding
$euc_tw = q{
[\x00-\x7F] # Code set 0 (CNS-Roman)
| [\xA1-\xFE][\xA1-\xFE] # Code set 1 (Plane 1)
| \x8E[\xA1-\xF0][\xA1-\xFE][\xA1-\xFE] # Code set 2 (Planes 1-80)
};
EUC-JP Encoding
$euc_jp = q{
[\x00-\x7F] # Code set 0 (ASCII/JIS-Roman)
| [\xA1-\xFE][\xA1-\xFE] # Code set 1 (JIS X 0208:1997)
| \x8E[\xA0-\xDF] # Code set 2 (Half-width katakana)
| \x8F[\xA1-\xFE][\xA1-\xFE] # Code set 3 (JIS X 0212-1990)
};
GBK and Big Five Plus Encodings
$gbk = q{
[\x00-\x7F] # ASCII or equivalent
| [\x81-\xFE][\x40-\x7E\x80-\xFE] # Two-byte (GBK or Big Five Plus)
};
GB 18030 Encoding
$gb18030 = q{
[\x00-\x7F] # ASCII or equivalent
| [\x81-\xFE][\x40-\x7E\x80-\xFE] # Two-byte
| [\x81-\xFE][\x30-\x39][\x81-\xFE][\x30-\x39] # Four-byte
};
Big Five Encoding
$big5 = q{
[\x00-\x7F] # ASCII/CNS-Roman
| [\xA1-\xFE][\x40-\x7E\xA1-\xFE] # Big Five
};