The Challenge of Representing Text in Computers

The main body of the Unicode standard is 1,040 pages long, counting indexes and appendices, and there's even more supplemental information—addenda, data tables, related substandards, implementation notes, and so on—on the Unicode Consortium's Web site. That's an awful lot of verbiage. And now this book offers another 800 pages on the subject. Why? After all, Unicode's just a character encoding. Sure, it includes a lot of characters, but how hard can it be?

Let's look at this issue for a few minutes. The basic principle is simple: If you want to be able to represent textual information in a computer, you make a list of all the characters you want to represent and assign a number to each one.[2]

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.