
568
|
Chapter 9: Information Processing Techniques
As you’ll soon realize, the latest Java APIs simplify this task immensely. Programming
languages such as Java can perform much of the necessary code conversion through the
use of built-in methods, which saves programmers signicant time, eort, and energy.
ere are, however, specialized algorithms that may or may not be available as built-in
methods or functions, such as half- to full-width katakana conversion (Japanese-specic)
and automatic code detection.
*
But, even implementing these algorithms in Java provides
much simplication due to the use of Unicode—the UTF-16 encoding form—internally.
is chapter continues with information about handling multiple bytes as a single unit for
operations such as text insertion, deletion, and searching. CJKV implications for sorting,
parsing, and regular expressions are covered at the end of the chapter.
In most cases, workable C or Java source code, along with an explanation of the algorithm
is given (Appendix C provides Perl equivalents of some of these algorithms). Feel free to
use these code fragments in your own programs—that is why they are included in this
book. e code samples that I provide here may not be the most ecient code, but they
do work.
†
Feel free to adapt what you nd here to suit your own programming style or
taste. e entire source code for these algorithms and exam ...