
8
|
Chapter 1: CJKV Information Processing Overview
Encoding Methods
Encoding is the process of mapping a character to a numeric value, or more precisely,
assigning a numeric value to a character. By doing this, you create the ability to uniquely
identify a character through its associated numeric value. e more unique a value is
among dierent encoding methods, the more likely that character identication will be
unambiguous. Ultimately, the computer needs to manipulate the character as a numeric
value. Independent of any CJKV language or computerized implementations thereof, in-
dexing encoded values allows a numerically enforced ordering to be mapped onto what
might otherwise be a randomly ordered natural language.
While there is no universally recognized encoding method, many have been commonly
used—for example, ISO-2022-KR, EUC-KR, Johab, and Unied Hangul Code (UHC) for
Korean. Although Unicode does not employ a single encoding form, it is safe to state that
the encoding forms for Unicode—UTF-8, UTF-16, and UTF-32—have become univer-
sally recognized. In addition, each one has become the preferred encoding form for spe-
cic uses. For the Web, UTF-8 is the most common encoding form. Applications prefer to
use the UTF-16 encoding form internally for its overall space-eciency for the majority
of multilingual text. OpenType fonts, when they include glyphs that map from ch