
Charset Designations
|
275
Ideographs encoded according to UnicodeTable 4-77.
Ideograph Unicode UTF-8 UTF-16BE UTF-32BE
622 6 2 6 22 6 22
649 6 1 89 6 49 6 49
47 9 87 47 47
557 5 97 5 57 5 57
Charset Designations
In this section you will learn about the dierence between a character set and an encod-
ing. You will also learn why this distinction is critically important in several important
contexts, one of which is information interchange, whether in the form of email or other
electronic media. You see, in order to explicitly indicate the content of a document, such
as an email message or an HTML le, there is the notion of “charset” (character set),
which is used as an identier.
Character Sets Versus Encodings
e fundamental ways in which character sets are dierent from encodings—which are
especially clear when in the context of CJKV—are as follows:
Character sets, especially CJKV ones, can usually be encoded in more than one way. •
Consider ISO-2022 and EUC encodings, both of which are commonly used to en-
code most CJKV character sets.
*
Most CJKV encodings support more than one character set. Consider EUC-JP for •
Japan, which supports JIS X 0201-1997, JIS X 0208:1997, and JIS X 0212-1990 in a
mixed one-, two-, and three-byte encoding.
Table 4-78 lists several CJKV encodings, along with the characters sets tha ...