Unicode Normalization Forms
Of course, an encoding that provides so many alternative ways of representing characters can give rise to text that is much more difficult than necessary to process. In particular, comparing strings for equality is a big challenge when significantly different sequences of bits are supposed to be treated as equal. One way to deal with this problem is to require that text be normalized, or represented in a uniform manner, or to normalize text at some well-defined point so as to simplify operations such as comparing for equality.
Of course, by defining something as the “canonical representation” of a particular idea, you essentially nominate it as the form to which you normalize. In this way, Unicode 1.x and 2.x could ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access