Converting Between Byte Arrays and Strings
The
java.lang.String
class has several
constructors that form strings
from byte arrays and several methods that return a
byte array corresponding to a given
string. Anytime a Unicode string is converted to bytes or vice versa,
that conversion happens according to one of the encodings listed in
Table 2.4. The same string can produce different
byte arrays if different encodings are used. Six constructors form a
new String object from a byte array:
public String(byte[] ascii, int highByte) public String(byte[] ascii, int highByte, int offset, int length) public String(byte[] data, String encoding) throws UnsupportedEncodingException public String(byte[] data, int offset, int length, String encoding) throws UnsupportedEncodingException public String(byte[] data) public String(byte[] data, int offset, int length)
The first two constructors, the ones with the
highByte argument, are leftovers from Java 1.0
that are deprecated in Java 1.1. These two constructors do not
accurately translate non-Latin-1 character sets into Unicode.
Instead, they read each byte in the ascii array as
the low-order byte of a two-byte character, then fill in the
high-order byte with the highByte argument. For
example:
byte[] isoLatin1 = new byte[256]; for (int i = 0; i < 256; i++) isoLatin1[i] = (byte) i; String s = new String(isoLatin1, 0);
Frankly, this is a kludge; it’s deprecated for good reason. This scheme works quite well for Latin-1 data with a high byte of ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access