
Byte Versus Character Handling
|
597
shifted_in = TRUE; /* Change to two-byte mode. */
break;
default :
break;
}
fprintf(out,"%c%c%c",p1,p2,p3); /* Print the escape sequence. */
}
}
else if (p1 == '$') {
p2 = getc(in);
switch (p2) {
case 'B' : /* JIS X 0208-1983. */
case '@' : /* JIS C 6226-1978. */
shifted_in = TRUE; /* Change to two-byte mode. */
fprintf(out,"%c%c%c",ESC,p1,p2); /* Print the escape sequence. */
break;
default :
fprintf(out,"%c%c",p1,p2); /* Print p1 and p2. */
break;
}
}
else
fprintf(out,"%c",p1); /* Print p1. */
}
}
}
Yep, you guessed it, Appendix C provides a similar program, but written in Perl. I encour-
age you to compare and contrast the C and Java examples in this chapter with Perl ver-
sions in Appendix C.
Byte Versus Character Handling
Most Western encoding methods have the luxury of assuming that one byte equals one
character, so inserting, deleting, and searching text becomes a simple matter of compar-
ing one byte with another. However, this is not the case with encodings that require more
than one byte to represent a single character, such as those used for representing CJKV
text. Life gets much more complex! A multiple-byte character is still a character. Consider
it an atomic unit. Aer all, you would gawk at Western-style soware that split characters
into four-bit units for some strange design reason. What I discuss next falls into what I
would cal ...