applicable syntax. For example, the rules of a programming language might restrict the
character repertoire in identifier names to letters, digits, and one or two other charac-
ters. On the other hand, the underscore (low line) character _ is often usable in names,
and it normally works reliably.
The Misnomer “8-bit ASCII”
The phrase “8-bit ASCII” is used surprisingly often. It follows from the discussion in
the previous section that in reality ASCII is strictly and unambiguously a 7-bit code in
the sense that all code positions are in the range 0–127. It can be, and it usually is,
represented using 8-bit bytes, but with the first bit always zero, or used for other pur-
poses so that it is not part of the encoded form of a character.
The misnomer “8-bit ASCII” most often denotes windows-1252, the 8-bit code defined
by Microsoft for use in the Western world. More generally, 8-bit ASCII is used to refer
to various character codes, which are extensions of ASCII and mutually more or less
incompatible. The character repertoire in such a code contains ASCII as a subset, the
code numbers are in the range 0–256, and the code numbers of ASCII characters equal
their ASCII codes.
ISO 8859 Codes
ISO 8859—or more formally, ISO/IEC 8859—is a family of character code standards.
They were largely developed by Ecma, which distributes ECMA standards that are
equivalent to ISO 8859 standards. ISO 8859 standards are largely oriented toward lan-
guages of European origin.
ISO 8859 codes are widely used on different platforms and in different contexts. For
example, on the Web, ISO 8859-1 was long treated as the default encoding. On Win-
dows, ISO 8859 as such is not used that much, but the corresponding, somewhat ex-
tended Windows encodings are common. In Unix and Linux, ISO 8859 is very com-
mon.
Each ISO 8859 standard tries to address the needs of one or more specific languages
and cultural environment, within the fairly narrow framework of 8-bit structure. This
means that in most cases, you cannot represent multilingual text using any single ISO
8859 encoding.
ISO 8859-1 (ISO Latin 1)
The international standard ISO 8859-1 defines a character repertoire identified as Latin
alphabet No. 1, commonly called ISO Latin 1, as well as a character code for it. The
repertoire contains the ASCII repertoire as a subset, and the code numbers for those
characters are the same as in ASCII. The standard also specifies an encoding, which is
similar to that of ASCII: each code number is presented simply as one octet.
124 | Chapter 3:Character Sets and Encodings

Get Unicode Explained now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.