O'Reilly logo

Unicode Demystified by Richard Gillam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

UTF-8

UTF-8 is the 8-bit Unicode encoding form. It was designed to allow Unicode to be used in places that support only 8-bit character encodings. A Unicode code point is represented using a sequence of anywhere from one to four 8-bit code units.

One vitally important property of UTF-8 is that it's 100 percent backward compatible with ASCII. That is, valid 7-bit ASCII text is also valid UTF-8 text. As a consequence UTF-8 can be used in any environment that supports 8-bit ASCII-derived encodings, and that environment will be able to correctly interpret and display the 7-bit ASCII characters. (The characters represented by byte values where the most significant bit is set, of course, aren't backward compatible—they have a different representation ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required