O'Reilly logo

Unicode Demystified by Richard Gillam

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Converting Between Unicode Encoding Forms

We'll start by looking at Unicode-to-Unicode transformations. As we've seen, the Unicode standard comprises a single coded character set, but multiple encoding forms:

  • UTF-32 represents each 21-bit code point value using a single 32-bit code unit.

  • UTF-16 represents each 21-bit code point value using either a single 16-bit code unit (for code points in the BMP) or a pair of 16-bit code units (for code points in the supplementary planes).

  • UTF-8 represents each 21-bit code point value with a single 8-bit code unit (for code points in the ASCII block), a sequence of two or three 8-bit code units (for code points in the rest of the BMP), or a sequence of four 8-bit code units (for code points in the supplementary ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required