3.2. Other general properties

By scanning over the categories and subcategories described in the previous section, we can quickly notice that many properties are omitted from the categorization. Another file at the Unicode site, by the name of Pioplist.txt, makes up for this deficiency by introducing a certain number of properties that are orthogonal to the notion of category.

Here is a snippet of the file, showing the characters that have the property of being "spaces":

   0009..000D     ; White_Space # Cc   [5] <control->..<control-D>
   0020           ; White_Space # Zs       SPACE
   0085           ; White_Space # Cc       <control->
   00A0           ; White_Space # Zs       NO-BREAK SPACE
   1680           ; White_Space # Zs       OGHAM SPACE MARK
   180E           ; White_Space # Zs       MONGOLIAN VOWEL SEPARATOR
   2000..200A     ; White_Space # Zs  [11] EN QUAD..HAIR SPACE
   2028           ; White_Space # Zl       LINE SEPARATOR
   2029           ; White_Space # Zp       PARAGRAPH SEPARATOR
   202F           ; White_Space # Zs       NARROW NO-BREAK SPACE
   205F           ; White_Space # Zs       MEDIUM MATHEMATICAL SPACE
   3000           ; White_Space # Zs       IDEOGRAPHIC SPACE

At the start of each line, we see the code points or ranges concerned. The name of the property appears after the semicolon. Everything after the pound sign is a comment; this section contains the character's category and its name or, when there are multiple characters, the names of the endpoints of the range.

Of these properties, which number 28 in all, here are the general-purpose ones. We shall see the others later when we discuss case, the bidirectional algorithm, etc.

3.2.1. Spaces

Get Fonts & Encodings now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.