while ((line = infile.readLine( )) != null) {
Matcher m = regexp.matcher(line);
if (m.find()) {
International Components for Unicode (ICU)
International Components for Unicode (ICU) activity is driven by major software
companies, but it involves voluntary work too and is based on the open source prin-
ciple. The ICU software consists of components (subroutines, modules) that are avail-
able as source code and portable to different operating systems. ICU is often charac-
terized as a “project,” but by its nature, it has to be a continuous activity, to keep up
with the development of the Unicode standard and related specifications.
Originally released (in 1999) as “IBM Classes for Unicode” and still substantially sup-
ported by IBM and other vendors, ICU has become the first choice for building software
that works with Unicode data, when possible. ICU was originally written in Java, and
later support to C and C++ has been added. The Java version is called ICU4J, and the
C and C++ version is ICU4C.
The official ICU site is hosted at http://www.ibm.com/software/globalization/icu/. It
contains a handy “Getting started with ICU” section. The other key site is found at
http://icu.sourceforge.net/ and is by SourceForge, the development and download re-
pository of open source code and applications. The sites are linked together in many
ways, so you can start in either of them. ICU contains software components for several
Basic text
Unicode text handling, character properties, and character code conversions
Text analysis
Unicode regular expressions and characters, operations on collections (sets) of
characters, and detection of word and line boundaries
Sorting and searching
Language-sensitive collation and searching
Normalization forms, case mappings, transliterations
General locale data and resource bundle architecture
Complex text layout
For example, Arabic, Hebrew, Indic, and Thai
International Components for Unicode (ICU) | 619

Get Unicode Explained now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.