A Brief History of Bits
Before we discuss the different charsets, let’s shift gears a little. Once upon a time, not so long ago, bits were very expensive. Hard disks for storing bits, memory for loading bits, communication equipment for sending bits over the wire; all the resources needed to handle bits were costly. To save on these expensive resources, characters were initially represented by only seven bits. This was enough to represent all letters in the English alphabet, the numbers through 9, punctuation characters, and some control characters. And that was all that was really needed in the early days of computing, since most computers were kept busy doing number crunching.
But as computers were given new tasks, often dealing with human-readable text, seven bits didn’t cut it. Adding one bit made it possible to represent all letters used in the western European languages. But there are other languages besides the western European languages, even though companies based in English-speaking countries often seem to ignore them. And eight bits is not enough to represent all characters used around the world. At first, this problem was partially solved by defining a number of standards for how eight bits should be used to represent different character subsets. Each of the ten ISO-8859 standards defines what is called a charset: a mapping between eight bits (a byte) and a character. For instance, ISO-8859-1, also known as Latin-1, defines the subset used for western European languages, ...
Get Java Server Pages now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.