
case be displayed each as two or more characters, which have no direct relationship
with the real character in the data. This is because consecutive octets would be inter-
preted as each indicating a character, instead of being treated according to the encoding
as a unit.
The “Character Set” Confusion
Character encodings are often called character sets, and the abbreviation charset is used
in Internet protocols to denote a character encoding. This is confusing because people
often understand “set” as “repertoire.” However, character set means a very specific
internal representation of characters, and for the same repertoire, several different
“character sets” can be used. A character set implicitly defines a repertoire, though: the
collection of characters that can be represented using the character set.
It is advisable to avoid the phrase “character set” when possible. The term character
code can be used instead when referring to a collection of characters and their code
numbers. The term character encoding is suitable when referring to a particular repre-
sentation.
For example, the word “ASCII” can mean a certain collection of characters, or that
collection along with their code numbers 0–127 as assigned in the ASCII standard, or
even more concretely, those code numbers (and hence the characters) represented using
an 8-bit byte for each character.
Working with Encodings
When you use ...