ISO 10646 and Unicode

It should be fairly clear by now that we have a real mess on our hands. Literally hundreds of different encoding standards exist, many of which are redundant, encoding the same characters but encoding them differently. Even the ISO 8859 family includes 10 different encodings of the Latin alphabet, each containing a slightly different set of letters.

As a consequence, you have to be very explicit about which encoding scheme you're using, lest the computer interpret your text as characters other than the ones you intend, garbling it in the process (log onto a Japanese Web site with an American computer, for example, and it's likely you'll see garbage rather than Japanese). About the only thing that's safely interchangeable ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.