Appendix E. Charsets
Table 5.1
lists the suggested charset(s) for a
number of languages. Charsets are used by servlets that generate
multilingual output; they determine which character encoding a
servlet’s PrintWriter
is to use. By default,
the PrintWriter
uses the ISO-8859-1 (Latin-1)
charset, appropriate for most Western European languages. To specify
an alternate charset, the charset value must be passed to the
setContentType()
method before the servlet
retrieves its PrintWriter
. For example:
res.setContentType("text/html; charset=Shift_JIS"); // A Japanese charset PrintWriter out = res.getWriter(); // Writes Shift_JIS Japanese
Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Also, the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.
Table E-1. Suggested Charsets
Language |
Language Code |
Suggested Charsets |
---|---|---|
Albanian |
sq |
ISO-8859-2 |
Arabic |
ar |
ISO-8859-6 |
Bulgarian |
bg |
ISO-8859-5 |
Byelorussian |
be |
ISO-8859-5 |
Catalan (Spanish) |
ca |
ISO-8859-1 |
Chinese (Simplified/Mainland) |
zh |
GB2312 |
Chinese (Traditional/Taiwan) |
zh (country TW) |
Big5 |
Croatian |
hr |
ISO-8859-2 |
Czech |
cs |
ISO-8859-2 |
Danish |
da |
ISO-8859-1 |
Dutch |
nl |
ISO-8859-1 |
English |
en |
ISO-8859-1 |
Estonian |
et |
ISO-8859-1 |
Finnish |
fi |
ISO-8859-1 |
French |
fr |
ISO-8859-1 |
German |
de |
ISO-8859-1 |
Greek |
el |
ISO-8859-7 |
Hebrew |
he (formerly iw) |
ISO-8859-8 ... |
Get Java Servlet Programming now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.