Character Data Encoding

All Java character Strings are rendered as 16-bit Unicode. Unicode is a standard specifically created for computer processing of character data. Its purpose is to provide a consistent manner in which to encode character data, so that users throughout the world, writing in multiple languages, can share a single system.

The problem that Unicode solves is the problem introduced by ASCII character encoding, which represents our Latin alphabet beautifully, but nothing else. This is no longer an acceptable mode for character data exchange in the Internet age. ASCII Latin characters can be represented by only 8 bits each, but have a very limited range; Unicode represents all of the characters from every major written language ...

Get Java Garage now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.