
be reserved for use as “escape” octets, specifying that the octet together with a certain
number
of subsequent octets forms a multioctet encoded presentation of one character.
As you can see, the number in the names UTF-32, UTF-16, UTF-8, and UTF-7 indicates
the size of the code unit in bits.
Saving as Unicode
Many programs let you save your data in different encodings. Even the Save As dialog
in Notepad has some alternatives, such as “ANSI” (which means windows-1252),
“Unicode” (which means UTF-16), “Unicode big-endian” (which means UTF-16 with
swapped byte order), and “UTF-8” (which surprisingly means UTF-8).
Advanced software that has been especially designed for multilingual applications typ-
ically contains explicit options for setting the encoding. Figure 3-6 illustrates this for
BabelPad, the editor discussed in Chapter 1. You can choose UTF-8, UTF-16, or
UTF-32 (as Character Encoding Scheme) from a drop-down menu, and then (when
applicable) select the byte order. There is also a setting for the newline convention—
i.e., which character or characters are used to indicate a line break (see Chapter 8); this
is logically distinct from any encoding issues but often presented along with encoding
for practical reasons.
On the other hand, many text-processing and other application programs do not let
you control the character encoding. They use their built-in settings for ...