3.2. Other general properties
By scanning over the categories and subcategories described in the previous section, we can quickly notice that many properties are omitted from the categorization. Another file at the Unicode site, by the name of Pioplist.txt, makes up for this deficiency by introducing a certain number of properties that are orthogonal to the notion of category.
Here is a snippet of the file, showing the characters that have the property of being "spaces":
0009..000D ; White_Space # Cc [5] <control->..<control-D> 0020 ; White_Space # Zs SPACE 0085 ; White_Space # Cc <control-> 00A0 ; White_Space # Zs NO-BREAK SPACE 1680 ; White_Space # Zs OGHAM SPACE MARK 180E ; White_Space # Zs MONGOLIAN VOWEL SEPARATOR 2000..200A ; White_Space # Zs [11] EN QUAD..HAIR SPACE 2028 ; White_Space # Zl LINE SEPARATOR 2029 ; White_Space # Zp PARAGRAPH SEPARATOR 202F ; White_Space # Zs NARROW NO-BREAK SPACE 205F ; White_Space # Zs MEDIUM MATHEMATICAL SPACE 3000 ; White_Space # Zs IDEOGRAPHIC SPACE
At the start of each line, we see the code points or ranges concerned. The name of the property appears after the semicolon. Everything after the pound sign is a comment; this section contains the character's category and its name or, when there are multiple characters, the names of the endpoints of the range.
Of these properties, which number 28 in all, here are the general-purpose ones. We shall see the others later when we discuss case, the bidirectional algorithm, etc.
3.2.1. Spaces
Get Fonts & Encodings now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.