
278
|
Chapter 4: Encoding Methods
Charset designatorsTable 4-79.
Encoding Ocial charset designator Preferred charset designator
UTF-16LE UTF-16LE same
UTF-32 UTF-32 same
UTF-32BE UTF-32BE same
UTF-32LE UTF-32LE same
UCS-2 ISO-10646-UCS-2 same
UCS-4 ISO-10646-UCS-4 same
In order to be compatible with older or poorly implemented soware, it is important to
maintain an aliasing mechanism that eectively maps several known charset designa-
tions to the preferred one. e charset registry maintains known aliases for each charset
designator.
Some charset designations can withstand the test of time, and others cannot. For example,
consider Korean, for which there are two camps when it comes to charset designations.
One camp prefers to use the designation “KS_C_5601-1987” for EUC-KR encoding. e
other camp simply prefers to use “EUC-KR” for EUC-KR encoding. In case it is not obvi-
ous, I belong to the latter camp. In 1998, all KS character set standards changed designa-
tion. For example, KS C 5601-1992 became KS X 1001:1992 and is now KS X 1001:2004.
As you can clearly see, the use of “KS_C_5601-1987” as a charset designator did not with-
stand the test of time. However, the use of “EUC-KR” is still completely valid and still
preferred, and clearly has withstood the test of time.
Code Pages
In the context of IBM and Microso documentation, there is oen mention of a Code
Page. A Code Page is ...