Printing All Occurrences of a Pattern
Problem
You need to find all the strings that match a given RE in one or more files or other sources.
Solution
This example reads through a file using a
ReaderCharacterIterator
, one of four
CharacterIterator classes in the Jakarta RegExp
package. Whenever a match is found, I extract it from the
CharacterIterator and print it.
The other character iterators are
StreamCharacterIterator
(as we’ll see in Chapter 9, streams are 8-bit bytes, while readers handle
conversion among various representations of
Unicode
characters), CharacterArrayIterator, and
StringCharacterIterator. All of these character
iterators are interchangeable; apart from the construction process,
this program would work on any of them. Use a
StringCharacterIterator, for example, to find all
occurrences of a pattern in the (possibly long) string you get from a
JTextArea’s getText( )
method, described in Chapter 13.
This code takes the getParen( ) methods from Section 4.6, the substring method from
the CharacterIterator interface, and the
match( )
method from the RE, and simply puts
them all together. I coded it to extract all the “names”
from a given file; in running the program through itself, it prints
the words “import”, “org”,
“apache”, “regexp”, and so on.
> jikes +E -d . ReaderIter.java > java ReaderIter ReaderIter.java import org apache regexp import java io import com darwinsys util Debug Demonstrate the Character Iterator interface print
I interrupted it here ...