
Regular Expressions
|
613
performs script-to-script transliteration.
*
CLDR, also mentioned earlier in this chapter, is
useful in that it provides transliterations and translations of country, language, and script
names.
Regular Expressions
Regular expressions (regexes is a short way to express this; seiki hyōgen in Japa-
nese) provide a very powerful mechanism to search for, replace, shred, or otherwise ma-
nipulate text or data. e most common regex engines, as found in popular Unix tools
such as awk, GNU Emacs, grep, Perl, Ruby, sed, Tcl , and so on, have no inherent CJKV-
specic capabilities. However, several CJKV-specic regex implementations have been
developed over the years. e most noteworthy of these include JPerl (Japanese Perl) and
GNU Emacs (version 20 or greater).
Adding CJKV or multiple-byte support to regex engines is a matter of being able to use
multiple-byte characters in places where one-byte characters are expected. is may sound
simple enough at rst glance, but there is much complexity to consider. e character-
class feature of regexes, for example, is an immediate candidate for this sort of extension.
e following is a typical regex character class denition in Perl:
/[0-9A-Fa-f]/
is character class includes any upper- or lowercase hexadecimal digits; the entire regex
(that is, what appears between the slashes) matches exactly one character in this character ...