Good Old ASCII
ASCII is still the set of characters that work safely in most text applications and on the
Internet. Almost all programming languages, command languages, markup languages,
Internet protocol headers, and many other notation systems still exclusively use ASCII
in their basic syntax. They may allow other characters in contexts like quoted strings,
but the commands, reserved words, and operators are written using good old ASCII.
Moreover, most character codes currently in use can be regarded as extensions of ASCII:
they preserve the meaning of code numbers 0 through 127 and add some more.
On the other hand, ASCII has a very small character repertoire. Historically, it was a
big improvement over even more restricted character codes, but it was created at a time
when bits were very expensive. ASCII was designed to be represented in 7 bits, and
many character positions were reserved for control codes such as linefeed (LF) and
escape (ESC). Only about a hundred character positions were assigned to printable
characters.
Moreover, since the needs of programming were more important than those of text
processing, the assignments use positions for many technical characters. Even “smart”
quotation marks were omitted; the idea was that the ASCII quotation mark, ", was to
be used as a neutral quotation mark.
American Origin
The name ASCII is originally an acronym for “American Standard Code for Information
Interchange.” The ASCII code was developed in the United States and standardized by
ANSI, the American National Standards Institute. The standard is often referred to as
ANSI X3.4-1986, but the current version is ANSI INCITS 4-1986 (R2002).
The creation of ASCII started in the late 1950s, and several additions and modifications
were made in the 1960s. The 1963 version had several unassigned code positions. The
ANSI standard, where those positions were assigned, mainly to accommodate lower-
case letters, was approved in 1967/1968, and later modified slightly.
The name US-ASCII is also used, and is even the preferred name in some recommen-
dations, to distinguish ASCII proper from different “national variants of ASCII.” In
principle, the name ASCII is unambiguous, since the “variants” are just different codes
with more or less resemblance to ASCII and with names of their own.
Contrary to popular belief, the designers of ASCII did not limit the scope to the English
language only. Some characters were included for the purpose of writing accented let-
ters. For example, the tilde ~ character was meant to be used so that it is overprinted
on a letter—e.g., writing “n,” Backspace, and ~ on paper to produce a character that
looks like ñ. This never became popular, and the characters introduced for the purpose
were used for other purposes as well, creating a conflict of interests in font design. But
ASCII surely tried to address the needs of other languages as well.
120 | Chapter 3:Character Sets and Encodings

Get Unicode Explained now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.