
Unicode Encoding Methods
|
219
UTF-7 encoding exampleTable 4-21.
Encoding form String
Characters
UCS-2 6 3 8 5
UCS-2 bit arrays 1111 11111 111 1111
Six-bit segments 1111 111 111 11 111 1
UTF-7 2 62 4 4 4 77 67 2
UTF-7—visual +bLOMwg-
Because UTF-7 is a seven-bit, byte-based encoding whose code units are seven-bit bytes,
there is no need to explicitly indicate byte order. is means that the BOM can eectively
be removed when converting to UTF-7 encoding. Of course, the BOM should be reintro-
duced when converting to the UCS-2, UCS-4, UTF-16, or UTF-32 encoding forms.
UTF-7 encoding, by denition, interoperates only with UCS-2 and UTF-16 encodings,
and is applied in big-endian byte order. UTF-7 encoding can obviously be converted to
UTF-8, UCS-4, or UTF-32 encoding, given that their encoding spaces are a superset of
that of UTF-7 encoding. is means that if UTF-7–encoded data is encountered, it can be
converted to a currently supported Unicode encoding form. e UTF-16 Surrogates are
treated as though they were UCS-2–encoded, meaning no special treatment.
Also note how this example necessarily spans character boundaries. is is because Base64
transformation spans byte boundaries due to the splitting of multiple bytes into six-bit
segments. UTF-7 clearly makes it challenging to perform many common text-processing
tasks, such as character ...