A Brief History of Bits
Let’s shift gears a little and discuss additional issues to consider when dealing with non-Western European languages. Once upon a time, not so long ago, bits were very expensive. Hard disks for storing bits, memory for loading bits, communication equipment for sending bits over the wire; all the resources needed to handle bits were costly. To save on these expensive resources, characters were initially represented by only seven bits. This was enough to represent all letters in the English alphabet, 0 through 9, punctuation characters, and some control characters. That was all that was really needed in the early days of computing, because most computers were kept busy doing number crunching.
But as computers were given new tasks, often dealing with human-readable text, 7 bits didn’t cut it. Adding one bit made it possible to represent all letters used in the Western European languages, but there are other languages besides the Western European languages, even though companies based in English-speaking countries often seem to ignore them. Eight bits is not enough to represent all characters used around the world. This problem was partly solved by defining a number of standards for how eight bits should be used to represent different character subsets. Each of the 10 ISO-8859 standards defines what is called a charset: a mapping between 8 bits (a byte) and a character. For instance, ISO-8859-1, also known as Latin-1, defines the subset used for Western European ...
Get JavaServer Pages, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.