
Code Conversion
|
289
is a traditional ideograph that is used by Chinese, as used in China, as well as in Taiwan,
Hong Kong, and Korea. ere are many more examples of simplied ideographs that are
specic to Chinese, as used in China.
is simplied ideograph issue can pose a problem if you are using an intermediate rep-
resentation, such as Unicode. Using Unicode is a problem only if you are not aware of this
issue. Unicode oen encodes two versions of the same ideograph, specically the simpli-
ed and traditional forms, as exemplied by 91 and 92. In the end, chances are
that all the ideographs will convert to an appropriate form simply because the author of
the original text was able to input them.
In order to eectively handle cases of characters that do not have a direct mapping to an-
other character set according to Unicode, making use of correspondence tables, such as
for simplied/traditional ideograph pairs and ideograph variants, can dramatically help
to improve the accuracy of the conversion. But, there will always be cases of unmappable
characters. It is unavoidable.
ICU (International Components for Unicode),
*
Basis Technology’s RCLU (Rosette Core Li-
brary for Unicode),
†
available for Unix and Windows, tcs
‡
(Translate Character Sets), also
available for Unix and Windows, and my own home-grown CJKVConv.pl
§
(written in
Perl) are examples of tools or libraries ...