Printing All Occurrences of a Pattern

Problem

You need to find all the strings that match a given RE in one or more files or other sources.

Solution

This example reads through a file using a ReaderCharacterIterator , one of four CharacterIterator classes in the Jakarta RegExp package. Whenever a match is found, I extract it from the CharacterIterator and print it.

The other character iterators are StreamCharacterIterator (as we’ll see in Chapter 9, streams are 8-bit bytes, while readers handle conversion among various representations of Unicode characters), CharacterArrayIterator, and StringCharacterIterator. All of these character iterators are interchangeable; apart from the construction process, this program would work on any of them. Use a StringCharacterIterator, for example, to find all occurrences of a pattern in the (possibly long) string you get from a JTextArea’s getText( ) method, described in Chapter 13.

This code takes the getParen( ) methods from Section 4.6, the substring method from the CharacterIterator interface, and the match( ) method from the RE, and simply puts them all together. I coded it to extract all the “names” from a given file; in running the program through itself, it prints the words “import”, “org”, “apache”, “regexp”, and so on.

> jikes +E -d . ReaderIter.java
> java ReaderIter ReaderIter.java
import
org
apache
regexp
import
java
io
import
com
darwinsys
util
Debug
Demonstrate
the
Character
Iterator
interface
print

I interrupted it here to save paper. The source code for this program is fairly short:

import org.apache.regexp.*;
import java.io.*;
import com.darwinsys.util.Debug;

/** Demonstrate the CharacterIterator interface: print
 * all the strings that match a given pattern from a file.
 */
public class ReaderIter {
    public static void main(String[] args) throws Exception {
        // The RE pattern
        RE patt = new RE("[A-Za-z][a-z]+");
        // A FileReader (see the I/O chapter)
        Reader r = new FileReader(args[0]);
        // The RE package ReaderCharacterIterator, a "front end"
        // around the Reader object.
        CharacterIterator in = new ReaderCharacterIterator(r);
        int end = 0;

        // For each match in the input, extract and print it.
        while (patt.match(in, end)) {
            // Get the starting position of the text
            int start = patt.getParenStart(0);
            // Get ending position; also updates for NEXT match.
            end = patt.getParenEnd(0);
            // Print whatever matched.
            Debug.println("match", "start=" + start + "; end=" + end);
            // Use CharacterIterator.substring(offset, end);
            System.out.println(in.substring(start, end));
        }
    }
}

Get Java Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.