O'Reilly logo

Java Cookbook by Ian F. Darwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Matching Newlines in Text

Problem

You need to match newlines in text.

Solution

Use \n or \r.

See also the flags constant RE.MATCH_MULTILINE, which makes newlines match as beginning-of-line and end-of-line (^ and $).

Discussion

While line-oriented tools from Unix such as sed and grep match regular expressions one line at a time, not all tools do. The sam text editor from Bell Laboratories was the first interactive tool I know of to allow multiline regular expressions; the Perl scripting language followed shortly. In our API, the newline character by default has no special significance. The BufferedReader method readLine( ) normally strips out whichever newline characters it finds. If you read in gobs of characters using some method other than readLine( ), you may have \n in your text string. Since it’s just an ordinary character, you can match it with .* or similar multipliers, and, if you want to know exactly where it is, \n or \r in the pattern will match it as well. In other words, to this API, a newline character is just another character with no special significance. You can recognize a newline either by the metacharacter \n, or you could also refer to it by its numerical value, \u000a.

import org.apache.regexp.*;

/**
 * Show line ending matching using RE class.
 */
public class NLMatch {
    public static void main(String[] argv) throws RESyntaxException {

        String input = "I dream of engines\nmore engines, all day long";
        System.out.println("INPUT: " + input);
        System.out.println(  );

        String[] patt = {
            "engines\nmore engines",
            "engines$"
        };

        for (int i = 0; i < patt.length; i++) {
            System.out.println("PATTERN " + patt[i]);

            boolean found;
            RE r = new RE(patt[i]);

            found = r.match(input);
            System.out.println("DEFAULT match " + found);

            r.setMatchFlags(RE.MATCH_MULTILINE);
            found = r.match(input);
            System.out.println("MATCH_MULTILINE match was " + found);
            System.out.println(  );
        }
    }
}

If you run this code, the first pattern (with the embedded \n) always matches, while the second pattern (with $) matches only when MATCH_MULTILINE is set.

> java NLMatch
INPUT: I dream of engines
more engines, all day long
 
PATTERN engines
more engines
DEFAULT match true
MATCH_MULTILINE match was true
 
PATTERN engines$
DEFAULT match false
MATCH_MULTILINE match was true

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required