Appendix F. Charsets
The following table lists the
suggested charset(s) for a number of
languages. Charsets are used by servlets that generate multilingual
output; they determine which character encoding a servlet’s
PrintWriter is to use. By default, the
PrintWriter uses the ISO-8859-1 (Latin-1) charset,
appropriate for most Western European languages. To specify an
alternate charset, the charset value must be passed to the
setContentType( ) method before the servlet
retrieves its PrintWriter, for example:
res.setContentType("text/html; charset=Shift_JIS"); // A Japanese charset
PrintWriter out = res.getWriter(); // Writes Shift_JIS JapaneseThe charset can also be set implicitly using the setLocale( ) method, for example:
res.setContentType("text/html");
res.setLocale(new Locale("ja", "")); // Sets charset to Shift_JIS
PrintWriter out = res.getWriter(); // Writes Shift_JIS JapaneseThe setLocale( ) method assigns a charset to the
response according to the table listed here. Where multiple charsets
are possible, the first listed charset is chosen.
Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Further note that the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.
|
Language |
Language Code |
Suggested Charsets |
|---|---|---|
|
Albanian |
sq |
ISO-8859-2 |
|
Arabic |
ar |
ISO-8859-6 |
|
Bulgarian |
bg |
ISO-8859-5 |
|
Byelorussian |
be |
ISO-8859-5 ... |