Character Data Encoding
All Java character Strings are rendered as 16-bit Unicode. Unicode is a standard specifically created for computer processing of character data. Its purpose is to provide a consistent manner in which to encode character data, so that users throughout the world, writing in multiple languages, can share a single system.
The problem that Unicode solves is the problem introduced by ASCII character encoding, which represents our Latin alphabet beautifully, but nothing else. This is no longer an acceptable mode for character data exchange in the Internet age. ASCII Latin characters can be represented by only 8 bits each, but have a very limited range; Unicode represents all of the characters from every major written language ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access