Using Regular Expressions in Java


You’re ready to utilize regular expression processing to beef up your Java code.


Use the Apache Jakarta Regular Expressions Package, org.apache.regexp.


As mentioned, the Apache project develops and maintains a regular expressions API. To ensure that you get the latest version, I don’t include it in the source archive for this book; you should download it from The good news is that it’s actually easy to use. If all you need is to find out whether a given string matches an RE, just construct the RE and call its boolean match( ) method:

RE r = new RE(pattern); // Construct an RE object
boolean found = r.match(input); // Use it to match an input.
if (found) {
    // it matched... do something with it...

A complete program constructing an RE and using it to match( ) is shown here:

import org.apache.regexp.*;

 * Simple example of using RE class.
public class RESimple {
    public static void main(String[] argv) throws RESyntaxException {
        String pattern = "^Q[^u]\\d+\\.";
        String input = "QA777. is the next flight. It is on time.";

        RE r = new RE(pattern); // Construct an RE object

        boolean found = r.match(input); // Use it to match an input.

        System.out.println(pattern +
            (found ? " matches " : " doesn't match ") + input);

The class RE provides the public API shown in Example 4-1. Unix users and Perl regulars may wish to skip this section, after glancing at the first few examples to see the syntactic details of how we’ve adapted regular expressions into the form of a Java API.

Example 4-1.  The Java Regular Expression API

/** The main public API of org.apache.regexp.RE.
 * Prepared in machine readable by javap and Ian Darwin.
public class RE extends Object {
    // Constructors
    public RE(  );
    public RE(String patt) throws RESyntaxException;
    public RE(String patt, int flg) throws RESyntaxException;
    public RE(REProgram patt);
    public RE(REProgram patt, int flg);

    public boolean match(String in);
    public boolean match(String in, int index);
    public boolean match(CharacterIterator where, int index);
    public String[] split(String)[];
    public String[] grep(Object[] in);
    public String subst(String in, String repl);
    public String subst(String in, String repl, int how);

    public String getParen(int level);
    public int getParenCount(  );
    public final int getParenEnd(int level);
    public final int getParenLength(int level);
    public final int getParenStart(int level);

    public int getMatchFlags(  );
    public void setMatchFlags(int flg);
    public REProgram getProgram(  );
    public void setProgram(REProgram prog);

This API is large enough to require some explanation. As you can see, there are several forms of the method called match( ) that return true or false. The simplest usage is to construct an RE and call its match( ) method against an input string, as in Example 4-1. This compiles the pattern given as the constructor argument into a form that can be compared against the match( ) argument fairly efficiently, then goes through and matches it against the string. The overloaded form match(String in, int index) is the same, except that it allows you to skip characters from the beginning. The third form, which takes a CharacterIterator as its argument, will be covered in Section 4.8.

Get Java Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.