displayed in all uppercase. The character data is preserved as such, however, so if
you later select the text again and uncheck the checkbox, the original form becomes
visible. You can also use this approach when defining a style in MS Word, since
the style settings have font formatting options, too.
Both of these mappings might perform simple mapping only, so they should be used
with caution; e.g., for texts in German and Turkish. Also note that mapping to titlecase
does not produce grammatically correct results for English, since it capitalizes every
word, but by English rules, words like “a” and “to” should be left lowercase.
In HTML or XML authoring, you might use a Cascading Style Sheet (CSS) declaration
like text‑transform: uppercase. Applied to a string, it performs a conversion to up-
percase when selecting glyphs for rendering the characters. The other values of the
property are lowercase, capitalize (= titlecase), and none.
Such operations can be a better choice than conversions at the character level, since
keeping the data itself in mixed case helps in editing, spellchecking, etc. Moreover,
character-level case mappings are irreversible: there is no way to deduce the original
form from the case-mapped string.
Such an approach also lets you use different stylesheets for the same data, using con-
version to uppercase only when it is judged to be the best way—e.g., for headings
(typically, due to lack of better typographic possibilities). However, beware that such
transformations might not work by Unicode rules for all characters and that they might
apply simple mappings. CSS specifications do not specify how the mappings are per-
formed. In practice, if you write <h1>Fuß</h1> in HTML and have the rule h1
{ text‑t ransform: uppercase } in CSS, you probably get “FUß” or even “FUS” (in-
correct) depending on the browser, instead of the full case folded result “FUSS.”
Collation and Sorting
Sortingis a well-known concept: we put data into a specific order, such as alphabetical
order. Collating order is a more technical concept, but closely related: the collating
order of characters and strings is the order by which sorting of character data takes
place. The collating order says, for example, that “a” < “b” or that “&” < “.”, using the
less than sign to mean “precedes (in the ordering).” Sorting is often called “alphabet-
izing,” although it generally operates on strings in general, not just alphabetic charac-
ters.
Sorting is relevant when we present a large amount of text data to users and the data
has some key component, such as a person’s name in a telephone catalog or a term in
a glossary. People are used to scanning through lists and tables, expecting them to be
in an alphabetic order (or, more generally, collating order) they have learned at school.
In the global context, it is important that different people have learned different orders.
256 | Chapter 5: Properties of Characters
Get Unicode Explained now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.