HTML 4.0 Language Tags

Coordinating characters sets is only the first part of the challenge. Even languages that share a character set may have different rules for hyphenation, spacing, quotation marks, punctuation, and so on. In addition to character shapes (glyphs), issues such as directionality (whether the text reads left-to-right or right-to-left) and cursive joining behavior had to be taken into account as well.

This prompted a need for a system of language identification. The W3C responded by incorporating the language tags put forth in the RFC 2070 standard on internationalization.

The “LANG” Attribute

The lang attribute can be added within any tag to specify the language of the contained element. It can also be added within the <html> tag to specify a language for an entire document. The following example specifies the document’s language as French:

<HTML LANG="fr">

It can also be used within text elements to switch to other languages within a document, for example, you can “turn on” Norwegian for just one element:

<BLOCKQUOTE lang="no">...</BLOCKQUOTE>

The value for the lang attribute is a two-letter language code (not the same as country codes). Table 27.1 lists the currently available language codes.

Table 27-1. Code for the Representation of Names of Languages

Code

Country

Code

Country

Code

Country

aa

Afar

ia

Interlingua

rn

Kirundi

ab

Abkhazian

id

Indonesian (formerly in)

ro

Romanian

af

Afrikaans

ie

Interlingue

 

Russian

am

Amharic

ik

Inupiak

Get Web Design in a Nutshell now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.