General Category
After the code point value and the name, the next most important property that a Unicode character has is its general category. Seven primary categories exist: letter, number, punctuation, symbol, mark, separator, and miscellaneous. Each is subdivided into additional categories.
Letters
The Unicode standard uses the term “letter” rather loosely in assigning things to this general category. Whatever counts as the basic unit of meaning in a particular writing system, whether it represents a phoneme, a syllable, or a whole word or idea, is assigned to the “letter” category. The major exception to this rule comprises marks that combine typographically with other characters, which are categorized as “marks” instead of “letters.” They ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access