
CHAPTER 5
Properties of Characters
Unicode contains about 100,000 characters and is still growing. To manage the mul-
titude of characters, we need to assign useful classifying and other properties to them.
The Unicode standard defines a large number of properties, related to things like de-
compositions, collation, sorting, directionality, and line breaking, as well as Unicode
normalization forms. Some of the properties are answers to simple questions like “Is
the character a digit?” or (for letters) “What is the corresponding uppercase letter?”
Many properties are more technical and intended for use in formal specifications and
in programming.
This chapter concentrates on properties in a rigorous sense: properties defined for
characters in the Unicode standard in an exact, objective, formalized manner. All the
properties discussed here differ from purely verbal descriptions of characters in the
standard, such as the description of possible glyph variation. For example, the descrip-
tion that the ASCII quotation mark " (U+0022) has a vertical glyph is surely relevant,
but not formalized. The same applies to other similar notes in the text of the standard
and the annotations in the code charts.
The Unicode standard designates some properties as normative . Such a property is
prescriptive in the sense that if a conforming implementation uses the property, it must
do so in accordance with its ...