
CHAPTER 1
Characters as Data
Computers
were originally built to process numbers. Over the last few decades, they've
become increasingly better at handling text as well, but the transition from human
scribbling and beautiful typography to bits and bytes has been complicated. Going from
a paper document to a computerized representation of that document means learning
about how the computer handles text, and requires learning about characters, character
codes, fonts, and encodings. Unicode provides a set of solutions for some of these
problems, while retaining presentation flexibility for making text look as we feel it
should.
Introduction to Characters and Unicode
Computer programs use two basic data types in most of their processing: characters
and numbers. These basic types are combined in various ways to create strings, arrays,
records, and other data structures. (Inside the computer, characters are numbers, but
the ways that these numbers are handled is very different from numbers meant for
calculation.)
Early computers were largely oriented toward numerical computation. However, char-
acters were used early on in administrative data processing, where names, addresses,
and other data needed to be stored and printed as strings. Text processing on computers
became more common much later, when computers had become so affordable that
they replaced typewriters. At present, most text documents are