
282
|
Chapter 4: Encoding Methods
Microso Code PagesTable 4-84.
Code Page Characteristics
949 KS X 1001:2004 character set, Unied Hangul Code encoding, remaining 8,822 hangul as extension
950 Big Five character set, Big Five encoding, Microsoft extensions (actually, only the ETen extensions of Row
9)
1258 TCVN-Roman character set
1361 Johab character set, Johab encoding
Although the Code Page designations of Microso Code Pages have remained the same
over the years, their contents or denitions have been expanded to cover additional char-
acters or new encodings that are true supersets of the previous version. Code Page 936,
for example, was once the GB 2312-80 character set encoded according to EUC-CN en-
coding, but is now based on GBK. Also, Code Page 949 was once the KS X 1001:2004
character set encoded according to EUC-KR, but is now dened as Unied Hangul Code
(UHC) encoding, which is detailed in Appendix F.
Code Conversion
Put simply, code conversion is interoperability between encodings. Given that encodings
must interoperate, code conversion is a necessary—and a very fundamental and basic—
task that even the simplest text-processing soware must perform. And code conversion
must be done correctly, or else any subsequent processing that is performed will propa-
gate any errors that were introduced, such as incorrect or missing characters.
Conversion of CJKV text from one