
Advice to Developers
|
295
Some encodings are damaged at the bit level, meaning that specic bits change value. Bits
exist in a binary condition and are either on or o. e bits that are damaged are eec-
tively switched from on to o. Eight-bit encodings that have had their eighth bits stripped
are such cases. Manual repair of EUC encoding is not terribly painful: you simply turn
on or enable the eighth bit of byte sequences that appear as garbage. e only problem
is detecting which bytes are used to compose two-byte characters—this is when human
intervention and interaction is required. Table 4-92 uses the same Japanese string as used
in the ISO-2022-JP–encoded example, but demonstrates how EUC-JP and Shi-JIS en-
codings can become damaged.
Damaged encoding example—ISO-2022-JP, EUC-JP, and Shi-JIS encodingsTable 4-92.
Encoding String
Original text
English
ISO-2022-JP—damaged $B$3$l$OOBJ8$NJ8>O$NNc$G! $=$l$O(J English $B$NJ8>O$NNc$G$9!#(J
EUC-JP—damaged $3$l$OOBJ8$NJ8>O$NNc$G! $=$l$O English $NJ8>O$NNc$G$9!#
Shift-JIS—damaged 1 j M a 6 L 6 M L a E A ; j M English L 6 M L a E 7 B
Shift-JIS—damaged 1jMa6L6MLaEA;jM English L6MLaE7B
For EUC-JP encoding, the crucial context-forming byte sequences, specically “$B” and
“(J” from the ISO-2022-JP encoding escape sequences, are missing. For Shi-JIS encod-
ing, the results are much dierent, in terms of what appears