O'Reilly logo

Java Cookbook by Ian F. Darwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Printing All Occurrences of a Pattern

Problem

You need to find all the strings that match a given RE in one or more files or other sources.

Solution

This example reads through a file using a ReaderCharacterIterator , one of four CharacterIterator classes in the Jakarta RegExp package. Whenever a match is found, I extract it from the CharacterIterator and print it.

The other character iterators are StreamCharacterIterator (as we’ll see in Chapter 9, streams are 8-bit bytes, while readers handle conversion among various representations of Unicode characters), CharacterArrayIterator, and StringCharacterIterator. All of these character iterators are interchangeable; apart from the construction process, this program would work on any of them. Use a StringCharacterIterator, for example, to find all occurrences of a pattern in the (possibly long) string you get from a JTextArea’s getText( ) method, described in Chapter 13.

This code takes the getParen( ) methods from Section 4.6, the substring method from the CharacterIterator interface, and the match( ) method from the RE, and simply puts them all together. I coded it to extract all the “names” from a given file; in running the program through itself, it prints the words “import”, “org”, “apache”, “regexp”, and so on.

> jikes +E -d . ReaderIter.java
> java ReaderIter ReaderIter.java
import
org
apache
regexp
import
java
io
import
com
darwinsys
util
Debug
Demonstrate
the
Character
Iterator
interface
print

I interrupted it here to save paper. The source code for this program is fairly short:

import org.apache.regexp.*;
import java.io.*;
import com.darwinsys.util.Debug;

/** Demonstrate the CharacterIterator interface: print
 * all the strings that match a given pattern from a file.
 */
public class ReaderIter {
    public static void main(String[] args) throws Exception {
        // The RE pattern
        RE patt = new RE("[A-Za-z][a-z]+");
        // A FileReader (see the I/O chapter)
        Reader r = new FileReader(args[0]);
        // The RE package ReaderCharacterIterator, a "front end"
        // around the Reader object.
        CharacterIterator in = new ReaderCharacterIterator(r);
        int end = 0;

        // For each match in the input, extract and print it.
        while (patt.match(in, end)) {
            // Get the starting position of the text
            int start = patt.getParenStart(0);
            // Get ending position; also updates for NEXT match.
            end = patt.getParenEnd(0);
            // Print whatever matched.
            Debug.println("match", "start=" + start + "; end=" + end);
            // Use CharacterIterator.substring(offset, end);
            System.out.println(in.substring(start, end));
        }
    }
}

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required