Dealing with Non-Western Languages
Supporting locales with non-Western languages adds another dimension to the subject of localization—namely, the issue of character encoding. As you probably know, the characters displayed on your screen are really represented by sequences of bits. To know which character to display for a sequence of bits, applications (e.g., a browser) consult a mapping between the bit sequences and the characters they represent. ASCII is an early standard mapping; it maps 7 bits (the numerical values 0 through 127) to the characters in the English alphabet, the numbers 0 through 9, punctuation characters, and some control characters. That was all that was really needed in the early days of computing, because most computers were kept busy crunching numbers.
But as computers were given new tasks, often dealing with human-readable text, 7 bits didn’t cut it. Adding one bit made it possible to represent all letters used in the Western European languages, but it was not enough to represent all characters used around the world. This problem was partly solved by defining a number of standards for using eight bits to represent different character subsets. Each of the 10 ISO-8859 standards defines what is called a charset: a mapping between eight bits (a byte) and a character. For instance, ISO-8859-1, also known as Latin-1, defines the subset used for Western European languages such as English, French, Italian, Spanish, German, and Swedish. ISO-8859-1 is the default ...
Get JavaServer Faces now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.