Chapter 3. Properties of Unicode characters
Our concern in this chapter is the information that Unicode provides for each character. According to our definition, a character is a description of a certain class of glyphs. One of these glyphs, which we have called the representative glyph, is shown in the Unicode charts, both in their hard-copy version [335] and in the PDF files available on the Web ([334]).
Unicode defines the identity of a character as the combination of its description and its representative glyph. On the other hand, the semantics of a character are given by its character identity and its normative properties.
This brings us to character properties. These are data on characters that have been collected over time and that can help us to make better use of Unicode. For example, one normative property of characters is their category. One possible category is "punctuation". A developer can thus know which characters of a given script are punctuation marks—information that will enable him to disregard those characters when sorting text, for example—without knowing anything at all about the script itself. Another property (not a normative one in this instance, and therefore more ambiguous) is the uppercase/lowercase correspondence. Unicode provides a table of these correspondences, which software can apply directly to convert a string from one case to the other (when the concept of case even applies to the writing system in question). Of course, none of these operations ...