Chapter 4. Combining Character Sequences and Unicode Normalization

One feature of Unicode we've talked a lot about is combining characters. Indeed, this ability is one of the features that gives Unicode its power. At the same time, it may be the single greatest contributor to Unicode's complexity. In this chapter, we'll take an in-depth look at combining characters and all of the issues that arise because of them.

Consider the following collection of characters:

These characters are the Latin vowels with various diacritical marks added to them. Most of these letters occur fairly often in various European languages. The group includes 20 letters, ...

Get Unicode Demystified now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.