
30
|
Chapter 1: CJKV Information Processing Overview
themselves—not the underlying bits of each byte—are reversed depending on endianness.
is is precisely why endianness is also referred to as byte order. e term endian is used
to describe what impact the byte at the end has on the overall value. e UTF-16 value
for the ASCII “space” character, 2, is < 2> for big-endian machines and <2 >
for little-endian ones.
Now that you understand the concept of endianness, the real question that needs answer-
ing is when endianness matters. Please keep reading….
What Are Multiple-Byte and Wide Characters?
If you have ever read comprehensive books and materials about ANSI C, you more than
likely came across the terms multiple-byte and wide characters. ose documents typi-
cally don’t do those terms justice, in that they are not fully explained. Here you’ll get a
denitive answer.
When dealing with encoding methods that are processed on a per-byte basis, endianness
or byte order is irrelevant. e bytes that compose each character have only one order, re-
gardless of the underlying architecture. ese encoding methods support what are known
as multiple-byte characters. In other words, these encoding methods use the byte as their
code unit.
So, what encoding methods support multiple-byte characters? Table 1-21 provides an in-
complete yet informative list of encoding methods that support multiple-b ...