Chapter 36. Unicode
Unicode is a system intended to allow computer display of all the written languages on Earth. This is a very ambitious goal, but the Unicode Consortium (http://www.unicode.org) has been surprisingly successful. The work on this project began in 1987 as an extension of 8-bit ASCII to allow representation of the most common European languages that use Latin, Greek, and Cyrillic alphabets.
This quickly evolved into “Unicode 88” (http://www.unicode.org/history/unicode88.pdf), which uses a 16-bit encoding for European and Asian languages. When the first byte is all zeros, this matches to ASCII characters.
We are now on the Unicode 5.1 standard, which is available in print and on Web sites. The standard defines the abstract idea of ...

Get Joe Celko's Data, Measurements and Standards in SQL now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.