
604
|
Chapter 9: Information Processing Techniques
As you can see from these examples, this problem varies in intensity depending on the
encoding method, and even on the soware you are using—compare the two types of out-
put you get for ISO-2022 encoding, using dierent applications. Some encodings require
slightly more overhead than simplistically treating multiple-byte characters as an insepa-
rable unit. For example, when dealing with ISO-2022 encoding, you must also remember
to insert and perhaps even delete escape sequences.
Character Attribute Detection Using C Macros
A useful function oen supported in CJKV text-processing programs—or for that matter
in most text-processing systems—is the ability to determine the attributes of characters
within a le. For example, it is oen convenient to obtain a listing of the numbers of Chi-
nese characters, kana, and other characters in a le. One can even break those categories
down further, such as kana into katakana and hiragana, ideographs into separate levels,
and so on.
e C programming language has a useful macro facility that allows programmers to
specify simple commands that can be used oen within a program. Macros are similar in
concept to functions, but require less work, although perhaps more thought.
As an example, several C macro denitions for detecting the attributes of Japanese (JIS X
0208:1997) characters are pro