September 2002
Intermediate to advanced
896 pages
21h 3m
English
That brings us to UTF-16. UTF-16 is the oldest Unicode encoding form, although its name goes back only a few years.
UTF-16 maps the 21-bit abstract code point values to sequences of 16-bit code units. For code point values in the BMP, which represent the vast majority of characters in any typical written document, this is a straightforward mapping. You just lop the five zero bits off the top, as shown in Figure 6.1.

For characters from the supplementary planes, the transformation is more complicated. To represent supplementary-plane characters, Unicode sets aside 2,048 ...
Read now
Unlock full access