
characters of the fictional Klingon language are not commonly used in texts, so they
have not been included into Unicode so far. The language’s fictional nature is no ob-
stacle per se; what matters is actual use in books, magazines, web pages, or elsewhere.
As a different issue, Unicode does not contain and does not aim at containing all char-
acters as separately coded characters with their own code points. Instead, characters
with diacritic marks can be represented as a character sequence consisting of a base
character and one or more combining diacritic marks.
Identity of Characters
In Chapter 1, we discussed the concept of character and described how Unicode defines
particular characters by assigning a code number, a Unicode name, and various prop-
erties to it and by showing a representative glyph. Here we consider some of the more
technical aspects of defining characters.
Characters as elementary units of text
If we consider normal English text, it looks rather obvious what the elementary units
of text are: letters, digits, spaces, punctuation marks, and a few special characters like
$. These units look indivisible, atomic, at any structural level. None of the characters
appears to be a composition of other characters, or of any parts.
Things get more complicated in other writing systems, and we need not consider any-
thing more complicated than accented letters—e.g., letter e with acute ...