Chapter 27. Internationalization

If the Web is to reach a truly worldwide audience, it needs to be able to support the display of all the languages of the world, with all their unique alphabets and symbols, directionality, and specialized punctuation. This poses a big challenge to HTML constructs as we know them. However, according to the W3C, “energetic efforts” are being made toward this complicated goal.

The W3C’s efforts for internationalization (referred to as "i18n”—an i, then 18 letters, then an n) address two primary issues. First is the handling of alternative character sets that take into account all the writing systems of the world. Second, is how to specify languages and their unique presentation requirements within an HTML document. Many solutions presented by internationalization experts in a document called RFC-2070 were incorporated into the current HTML 4.0 Specification.

This chapter addresses both key issues for internationalization, as well as the new character set and language features in HTML 4.0.

Character Sets

The first challenge in internationalization is dealing with the staggering number of unique character shapes (called "glyphs”) that occur in all the writing sytems of the world. This includes not only alphabets, but all ideographs (characters that indicate a whole word or concept) for languages such as Chinese, Japanese, and Korean.

8-Bit Encoded Character Sets

Character encodings (or character sets) are organizations of characters—units of a written ...

Get Web Design in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.