
Excessive Unification
The unification principles and practices have raised many objections. Unification pre-
vents people from making distinctions that they might wish to make at the level of plain
text. In some situations, a distinction could be made, but not reliably.
Problematic unification cases include the following (in addition to Han unification,
which was discussed earlier):
• The character ü as an independent letter indicating a particular sound (as in Swed-
ish) versus ü as “u” to which a diacritic mark has been added (as in Spanish). Many
people regard these as different characters. In Unicode, you could try to distinguish
between the two by using the precomposed character U+007C (Latin small letter
“u” with dieresis) in the first case and the two-character sequence U+0075 (Latin
small letter “u”) U+0308 (combining dieresis) for the latter. However, these are
canonically equivalent, and you cannot expect that software conforming to the
Unicode standard makes the difference. On the contrary, it normally shouldn’t,
and it normally doesn’t.
• The character æ is a separate letter in Danish and Norwegian. In some other con-
texts, including some styles of writing Latin words used in English, it is just a
ligature of “a” and “e” (as in “Cæsar” for “Caesar”). There is no way to make this
distinction in Unicode, although between the lines we can read the idea that liga-
tures should be handled ...