
Code Conversion Algorithms
|
579
However, when dealing with conversion between character sets of dierent locales, such
as between GB 2312-80 and Big Five, the relationship is not always one-to-one, thus
round-trip conversion is not always possible. For this reason, and given the extent to
which Unicode is supported in today’s OSes and applications, it makes sense to keep your
data in Unicode, unless you have a very good reason not to do so.
e code conversion techniques in this section cover Unicode, along with two CJKV-
specic encoding methods: ISO-2022 and EUC. e Japanese-specic Shi-JIS encoding
method is also covered, as is conversion to and from Row-Cell notation. ese techniques
can be easily applied to any CJKV locale as long as their character sets are based on these
encoding methods or notations.
It is best to treat the vendor encoding methods, as described in Appendix F, as exceptional
cases. It is also best to avoid using such encoding methods and character sets if your so-
ware requires the maximum amount of exibility and information interchange—this is a
portability issue.
e following sections contain more detailed information about dealing with the conver-
sion of these and other encoding methods. Two of the conversion algorithms require the
use of somewhat complex functions for maximum eciency (at least, when writing code
in a language other ...