September 2002
Intermediate to advanced
896 pages
21h 3m
English
We'll start by looking at Unicode-to-Unicode transformations. As we've seen, the Unicode standard comprises a single coded character set, but multiple encoding forms:
UTF-32 represents each 21-bit code point value using a single 32-bit code unit.
UTF-16 represents each 21-bit code point value using either a single 16-bit code unit (for code points in the BMP) or a pair of 16-bit code units (for code points in the supplementary planes).
UTF-8 represents each 21-bit code point value with a single 8-bit code unit (for code points in the ASCII block), a sequence of two or three 8-bit code units (for code points in the rest of the BMP), or a sequence of four 8-bit code units (for code points in the supplementary ...
Read now
Unlock full access