Chapter 6. Character Sets

Here, put this fish in your ear.

Ford Prefect

We live in a diverse and ever-changing universe. Even on this rather mundane M-class planet we call Earth, we speak hundreds of different languages. In The Hitchhikers Guide to the Galaxy, Arthur Dent solved his language problem by placing a Babelfish in his ear. He could then understand the languages spoken by the diverse (to say the least) characters he encountered along his involuntary journey through the galaxy.[1]

On the Java platform, we don’t have the luxury of Babelfish technology (at least not yet).[2] We must still deal with multiple languages and the many characters that comprise those languages. Luckily, Java was the first widely used programming language to use Unicode internally to represent characters. Compared to byte-oriented programming languages such as C or C++, native support of Unicode greatly simplifies character data handling, but it by no means makes character handling automatic. You still need to understand how character mapping works and how to handle multiple character sets.

Character Set Basics

Before discussing the details of the new classes in java.nio.charsets, let’s define some terms related to character sets and character transcoding. The new character set classes present a more standardized approach to this realm, so it’s important to be clear on the terminology used.

Character set

A set of characters, i.e., symbols with specific semantic meanings. The letter “A” is a character. ...

Get Java NIO now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.