Normalization-Related Properties

We have already looked at the Unicode normalization forms and the way in which Unicode normalization works. Much of the Unicode Character Database is given over to this important topic, so we'll take a closer look at the normalization-related properties here. For more information on decomposition and normalization, refer to Chapter 4. For in-depth information on implementing Unicode normalization, see Chapter 14.

Decomposition

As we saw earlier, many Unicode characters are said to “decompose.” That is, they're considered equivalent to (and, generally speaking, less preferable than) other Unicode characters or sequences of other Unicode characters. For those characters that decompose, the UnicodeData.txt file gives ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.