Java uses the Unicode character encoding. (Java 1.3 uses Unicode Version 2.1. Support for Unicode 3.0 will be included in Java 1.4 or another future release.) Unicode is a 16-bit character encoding established by the Unicode Consortium, which describes the standard as follows (see http://unicode.org ):
The Unicode Standard defines codes for characters used in the major languages written today. Scripts include the European alphabetic scripts, Middle Eastern right-to-left scripts, and scripts of Asia. The Unicode Standard also includes punctuation marks, diacritics, mathematical symbols, technical symbols, arrows, dingbats, etc. ... In all, the Unicode Standard provides codes for 49,194 characters from the world’s alphabets, ideograph sets, and symbol collections.
In the canonical form of Unicode encoding, which is what Java
String types use, every character occupies
two bytes. The Unicode characters
\u007E are equivalent to the ASCII and
ISO8859-1 (Latin-1) characters
0x7E. The Unicode
\u00FF are identical to the ISO8859-1
0xFF. Thus, there is a trivial mapping
between Latin-1 and Unicode characters. A number of other portions of
the Unicode encoding are based on preexisting standards, such as
ISO8859-5 (Cyrillic) and ISO8859-8 (Hebrew), though the mappings
between these standards and Unicode may not be as trivial as the
Note that Unicode support may be limited on many platforms. One of ...