
Some diacritic marks have a regular appearance that deviates from what you might
expect from their name. The Latin capital letter “T” with caron, used in Czech and
Slovak, looks as you’d expect: Ť. However, its lowercase counterpart, Latin small letter
“t” with caron U+0165, has a comma-like diacritic in most fonts: ť. This means that
the diacritic mark looks like a comma or an apostrophe but it is called caron and treated
as caron in Unicode (e.g., in the canonical decomposition). Although this sounds un-
natural, it would also be unnatural to have “T” with caron mapped to, say, “t” with
comma above right in an uppercase-to-lowercase mapping.
Spacing Diacritic Marks
When a combining diacritic mark is applied to a space character, we get the diacritic
itself as a visible character. Alternatively, we might use a character that itself represents
a spacing diacritic mark, often called “spacing clones” of diacritic marks. Such char-
acters appear, for historical reasons, in different blocks, such as Latin-1 Supplement
and Spacing Modifier Letters.
Starting from of Unicode 4.1, the recommendation is to apply a combining diacritic
mark to a no-break space U+00A0 rather than space U+0020. The reason is “potential
conflicts with the handling of sequences of U+0020 space characters in contexts like
XML.” However, the formal definitions still to define decompositions using the space.
For example, the acute ...