Finding the Matching Text

Problem

You need to find the text that matched the RE.

Solution

Sometimes you need to know more than just whether an RE matched an input string. In editors and many other tools, you will want to know exactly what characters were matched. Remember that with multipliers such as * , the length of the text that was matched may have no relationship to the length of the pattern that matched it. Do not underestimate the mighty .*, which will happily match thousands or millions of characters if allowed to. As you can see from looking at the API, you can find out whether a given match succeeds just by using match( ), as we’ve done up to now. But it may be more useful to get a description of what it matched by using one of the getParen( ) methods.

The notion of parentheses is central to RE processing. REs may be nested to any level of complexity. The getParen( ) methods let you retrieve whatever matched at a given parenthesis level. If you haven’t used any explicit parens, you can just treat whatever matched as “level zero.” For example:

// Part of REmatch.java
String patt = "Q[^u]\\d+\\.";
RE r = new RE(patt);
String line = "Order QT300. Now!";
if (r.match(line)) {
    System.out.println(patt + " matches '" +
        r.getParen(0) +
        "' in '" + line + "'"); Match whence = RE.match(patt, line); 
}

When run, this prints:

Q[^u]\d+\. matches "QT300." in "Order QT300. Now!"

It is also possible to get the starting and ending indexes and the length of the text that the pattern matched (remember that \d+ can match any number of digits in the input). You can use these in conjunction with the String.substring( ) methods as follows:

// Part of REsubstr.java -- Prints exactly the same as REmatch.java
 if (r.match(line)) {
    System.out.println(patt + " matches '" +
        line.substring(r.getParenStart(0), r.getParenEnd(0)) +
        ' in '" + line + "'"); 
}

Suppose you need to extract several items from a string. If the input is:

Smith, John
Adams, John Quincy

and you want to get out:

John Smith
John Quincy Adams

just use:

// from REmatchTwoFields.java
// Construct an RE with parens to "grab" both field1 and field2
RE r = new RE("(.*), (.*)");
if (!r.match(inputLine))
    throw new IllegalArgumentException("Bad input: " + inputLine);
System.out.println(r.getParen(2) + ' ' + r.getParen(1));

Get Java Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.