
Miscellaneous Algorithms
|
595
output[ixOut++] = input[ixIn++];
}
}
String output_string = new String(output);
return output_string.substring(0,ixOut);
}
}
Appendix C provides a half- to full-width katakana conversion program written in Perl.
It is dierent in that it is not based on Unicode, but rather supports EUC-JP and Shi-JIS
encodings.
Encoding Repair
ISO-2022-JP–encoded les oen become damaged or corrupt from soware that strips
out escape characters. Some programs have a tendency to lter out control characters
from les; the escape character (1 or 1), which is an essential part of ISO-
2022-JP encoding, is a control character. Luckily, there are ways to repair corrupted
ISO-2022-JP–encoded les.
You can make a few assumptions before you proceed to repair damaged ISO-2022-JP–
encoded les. e rst assumption is that the text stream begins, and also ends, in
one-byte mode. In addition, each line begins and ends in one-byte mode. e next as-
sumption is that the other characters that make up a complete escape sequence are still
intact. ese may include such strings as
$@, $B, (J, and (B. Depending on the n-byte-per-
character mode, you need to scan for dierent strings.
While in one-byte mode, you need only to scan for the string
$B or $@, which should
signify the beginning of two-byte mode. e chances of encountering such