
294
|
Chapter 4: Encoding Methods
Other Types of Encoding Repair
As opposed to being reencoded by a well-understood and proven transformation, some
data can become truly damaged in the sense that information is removed, either in terms
of entire bytes, or the values of specic bits. Such data appears as garbage, meaning it is
unreadable. is is referred to as mojibake ( mojibake) in Japanese.
We rst discuss the repair procedure for ISO-2022-JP–encoded les as an example of how
encodings may become damaged by having key characters removed, and then repaired.
In the past, one might have received Japanese email messages or attempted to display
Japanese articles from Usenet News, which had their “escape” characters stripped out by
unfriendly email or news reading soware. Sometimes the escape characters are simply
mangled—converted into a single “space” character (2) or into Quoted-Printable (dis-
cussed previously). is was a very annoying problem because one usually threw out such
email messages or articles, rather than suering through the manual and the somewhat
grueling task of manually restoring the escape characters. For example, look at the ISO-
2022-JP–encoded string in Table 4-91. It is rst shown as it should appear when displayed
properly, and then shown as it could appear if it were damaged, specically with its escape
characters missing. ere are certainly lots ...