
HTML—HyperText Markup Language
|
715
ere are three ways to specify charset information in CSS, specically the in-le method
(similar to the use of the
<META> tag in HTML), via the HTTP header, and from the refer-
ring document. Interestingly, the HTML standard states that the HTTP header has higher
precedence than the
<META> tag declarations, and contemporary web browsing applica-
tions adhere to this.
Automatic encoding detection issues
Most contemporary browsers provide a feature that automatically detects what encoding
is being used, and the algorithms that perform this have improved signicantly. EUC-JP
and Shi-JIS encodings used for Japanese text can oen be dicult to dierentiate. ere
is a useful trick that can help browsers more reliably detect the encoding by including an
HTML comment near the top of the HTML le. is comment includes a single charac-
ter whose encoding is unambiguously either EUC-JP or Shi-JIS encoding. I suggest the
following two meaningful characters for this purpose: (tōkyō, meaning “Tokyo”).
e character codes for these two characters are <5 > and <5 > for EUC-JP, and
<93 8> and <8 9> for Shi-JIS. Note how the second kanji is unambiguously either
EUC-JP or Shi-JIS. e following is an example HTML comment that includes these
two kanji:
<!--- used for correct automatic encoding detection --->
is technique provides a reliable backu