Here, put this fish in your ear.
We live in a diverse and ever-changing universe. Even on this rather mundane M-class planet we call Earth, we speak hundreds of different languages. In The Hitchhikers Guide to the Galaxy, Arthur Dent solved his language problem by placing a Babelfish in his ear. He could then understand the languages spoken by the diverse (to say the least) characters he encountered along his involuntary journey through the galaxy.
On the Java platform, we don’t have the luxury of Babelfish technology (at least not yet). We must still deal with multiple languages and the many characters that comprise those languages. Luckily, Java was the first widely used programming language to use Unicode internally to represent characters. Compared to byte-oriented programming languages such as C or C++, native support of Unicode greatly simplifies character data handling, but it by no means makes character handling automatic. You still need to understand how character mapping works and how to handle multiple character sets.
Before discussing the details of
the new classes in
let’s define some terms related to character sets
and character transcoding. The new character set classes present a
more standardized approach to this realm, so it’s
important to be clear on the terminology used.
A set of characters, i.e., symbols with specific semantic meanings. The letter “A” is a character. ...