The simplest (and newest) of the Unicode encoding forms is UTF-32, which was first defined in Unicode Standard Annex #19 (now officially part of Unicode 3.1). To go from the 21-bit abstract code point value to UTF-32, you simply zero-pad the value out to 32 bits.

UTF-32 exists for three basic reasons:

  1. It's the Unicode standard's counterpart to UCS-4, the four-byte format from ISO 10646.

  2. It provides a way to represent every Unicode code point value with a single code unit, which can make for simpler implementations.

  3. It can be useful as an in-memory format on systems with a 32-bit word length. Some systems either don't give you a way to access individual bytes of a 32-bit word or impose a performance penalty for doing so. If memory is cheap, ...

Get Unicode Demystified now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.