Character Data
Numbers are only part of the data a typical Java program needs to read and write. Most programs also need to handle text, which is composed of characters. Since computers only really understand numbers, characters are encoded by matching each character in a given script to a particular number. For example, in the common ASCII encoding, the character A is mapped to the number 65; the character B is mapped to the number 66; the character C is mapped to the number 67; and so on. Different encodings may encode different scripts or may encode the same or similar scripts in different ways.
Java understands several dozen different character sets for a variety of languages, ranging from ASCII to the Shift Japanese Input System (SJIS) to Unicode. Internally, Java uses the Unicode character set. Unicode is a two-byte extension of the one-byte ISO Latin-1 character set, which in turn is an eight-bit superset of the seven-bit ASCII character set.
ASCII
ASCII, the American
Standard Code for Information Interchange, is a seven-bit character
set. Thus it defines 27 or 128 different
characters whose numeric values range from
to 127. These characters are sufficient for handling most of American
English and can make reasonable approximations to most European
languages (with the notable exceptions of Russian and Greek).
It’s an often used lowest common denominator format for
different computers. If you were to read a byte
value between
and 127 from a stream, then cast it to a char, the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access