Multiple-Byte Encoding Systems

These distinctions between layers become more interesting when you start talking about East Asian languages such as Chinese, Japanese, and Korean. These languages all make use of the Chinese characters. No one is really sure how many Chinese characters exist, although the total probably exceeds 100,000. Most Chinese speakers have a working written vocabulary of some 5,000 characters or so. Japanese and Korean speakers, who depend more on auxiliary writing systems to augment the Chinese characters, have a somewhat smaller vocabulary.

East Asian Coded Character Sets

With that many characters, you must start by officially defining particular sets of Chinese characters in which you're interested. The Japanese government, ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.