
Input Methods
|
11
In an attempt to allow for a mixture of one- and two-byte characters, several CJKV encod-
ing methods have been developed. As you will learn in Chapter 4, these encoding meth-
ods are largely, but not completely, compatible. You will also see that there are encoding
methods that use three or even four bytes to represent a single character!
e most common Japanese encoding methods are ISO-2022-JP, Shi-JIS, and EUC-JP.
ISO-2022-JP, the most basic, uses seven-bit bytes (or, seven bits of a byte) to represent
characters, and requires special characters or sequences of characters (called shiing char-
acters or escape sequences) to shi between one- and two-byte modes. Shi-JIS and EUC-
JP encodings make generous use of eight-bit characters, and use the value of the rst byte
as the way to distinguish one- and multiple-byte characters. Now, Unicode has become
the most common encoding for Japanese.
Input Methods
ose who type English text have the luxury of using keyboards that can hold all the keys
to represent a sucient number of characters. CJKV characters number in the thousands,
though, so how does one type CJKV text? Large keyboards that hold thousands of indi-
vidual keys exist, but they require special training and are dicult to use. is has led to
soware solutions: input methods and the conversion dictionaries that they employ.
Most Chinese and Japanese text i ...