
198
|
Chapter 4: Encoding Methods
Before I describe each Unicode encoding form, I feel that it is useful to introduce some
new concepts that will make these encodings forms easier to understand. e sections
that follow draw attention to special characters or properties of Unicode. Knowing when
these special characters should be used, and understanding the properties, will help to
guide your way through the rest of this chapter.
Special Unicode Characters
Before we dive into full descriptions and explanations of the various Unicode encod-
ing forms, I rst want to draw your attention to ve special characters in Unicode that
are worth mentioning in the context of this book, all of which are listed in Table 4-3,
along with their representations in the Unicode encoding forms that are covered in this
chapter.
Special Unicode charactersTable 4-3.
Character name Unicode UTF-32BE UTF-32LE UTF-16BE UTF-16LE UTF-8
Ideographic Space 3 3 3 3 3 3 8 8
Geta Mark 313 3 13 13 3 3 13 13 3 3 8 93
Ideographic Variation Indicator 33 3 3 3 3 3 3 3 3 3 8
Byte Order Mark (BOM)
Replacement Character
e Ideographic Space (3) character is special in the sense that it easily confused
with the Em Space ...