Diacritical Marks
One of the principles underlying Unicode is the idea of dynamic composition—you can represent the marked form of a letter using two code points: one representing the letter, followed by another one representing the mark. Quite a few of the letters in the Latin blocks are marked forms—base letters with some kind of mark applied. All of these characters can be represented using two code points. To make this possible, Unicode includes a whole block of characters—the Combining Diacritical Marks block, which runs from U+0300 to U+036F. The characters in this block are special because they specifically have combining semantics. They always modify the character that precedes them in storage. As a consequence, these characters are generally ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access