Appendix E. Charsets

Table 5.1 lists the suggested charset(s) for a number of languages. Charsets are used by servlets that generate multilingual output; they determine which character encoding a servlet’s PrintWriter is to use. By default, the PrintWriter uses the ISO-8859-1 (Latin-1) charset, appropriate for most Western European languages. To specify an alternate charset, the charset value must be passed to the setContentType() method before the servlet retrieves its PrintWriter. For example:

res.setContentType("text/html; charset=Shift_JIS");  // A Japanese charset
PrintWriter out = res.getWriter();  // Writes Shift_JIS Japanese

Note that not all web browsers support all charsets or have the fonts available to represent all characters, although at minimum all clients support ISO-8859-1. Also, the UTF-8 charset can represent all Unicode characters and may be assumed a viable alternative for all languages.

Table E-1. Suggested Charsets

Language

Language Code

Suggested Charsets

Albanian

sq

ISO-8859-2

Arabic

ar

ISO-8859-6

Bulgarian

bg

ISO-8859-5

Byelorussian

be

ISO-8859-5

Catalan (Spanish)

ca

ISO-8859-1

Chinese (Simplified/Mainland)

zh

GB2312

Chinese (Traditional/Taiwan)

zh (country TW)

Big5

Croatian

hr

ISO-8859-2

Czech

cs

ISO-8859-2

Danish

da

ISO-8859-1

Dutch

nl

ISO-8859-1

English

en

ISO-8859-1

Estonian

et

ISO-8859-1

Finnish

fi

ISO-8859-1

French

fr

ISO-8859-1

German

de

ISO-8859-1

Greek

el

ISO-8859-7

Hebrew

he (formerly iw)

ISO-8859-8 ...

Get Java Servlet Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.