son, at http://www.evertype.com/alphabets/.
It is extensive and based on detailed re-
search, although it partly applies different criteria to different languages: for some lan-
guages, it includes only the basic modern alphabet; for others, it lists historical char-
acters and other characters that are not used in normal writing. The CLDR database,
discussed in Chapter 11, contains information on the use of letters in different lan-
guages.
Variation of Writing Systems
The most widely used writing systems, or scripts, can be classified as follows:
Alphabetic scripts
Denote sounds with letters, though usually not in a strict one-to-one manner. Ex-
amples: Latin, Greek, and Cyrillic scripts, each of which exists in different versions.
Consonant scripts, or abjads
Basically denote consonants, leaving vowels to be inferred; however, consonant
scripts may have letters for long vowels, and in some situations even short vowels
are written using small signs attached to consonants. Examples: Hebrew and Ara-
bic scripts.
Figure 1-4. Sample information on a character in the eki.ee database
28 | Chapter 1:Characters as Data
Abugida scripts
These use consonant letters that imply a particular vowel after the consonant, when
used in the base form. Alternatives with other vowels or without any vowel are
indicated by additional marks. Many South and Southeast Asian scripts belong to
this category—e.g., the Devanagari script used for many Indic languages.
Syllabic scripts
Use basically one character for each syllable. Examples: the Hiragana and Katakana
scripts, used for Japanese.
Ideographic scripts
Use basically one character for one (short) word. The most widely known ideo-
graphic script is Han, often known as Chinese script, though it is also used (in part)
for other languages as well, especially Japanese and Korean, and therefore often
called “CJK.”
Consonantal writing may sound impossible, because it introduces so much ambiguity.
However, although an individual written form of a word is often ambiguous, the am-
biguities are usually resolved easily from the context by a person who understands the
language well. Moreover, languages written with a consonantal script typically have a
structure that makes this easier than for English, for example. When vowels are mainly
used to express variations of a common theme expressed by a word root, consisting of
a pattern described by a combination of consonants, the vowels can usually be inferred
from the grammatical context.
The word “script” is often used in character code contexts instead of “writing system.”
It is important to distinguish it from the use of the word “script” to denote a program-
ming concept—a certain type of a computer program, such as a Perl script.
Some scripts, such as the Latin script, are written with spaces between words, and a
space is normally a permissible line break point. Hyphenation may introduce other
break points. Other scripts may permit line breaks more freely.
The Latin script and many other scripts are written left to right, with lines proceeding
from top to bottom. These are not universal properties of human writing, and even the
Latin script is historically based on a script that was written right to left. Unicode ad-
dresses the problem of left-to-right versus right-to-left writing in two ways: by defining
inherent directionality for characters and by defining control characters for affecting
writing direction. For example, Hebrew and Arabic letters have inherent right-to-left
directionality. Special methods are needed when text in such letters contains names or
quotations that have the opposite directionality, or vice versa.
In Latin scripts, each character is normally displayed as a separate image on screen or
paper, though the spacing between characters may vary. In other scripts, the formatting
of texts for visual presentation can be essentially more difficult: the shape of a character
may depend on context; adjacent characters can be written together (using a ligature
or using cursive writing where characters join smoothly); and a character might be
displayed as an auxiliary symbol above, below, before, or behind another character.
Variation of Writing Systems | 29

Get Unicode Explained now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.