Regular Expressions and Character Decoding

Example 6-3 demonstrates the text-matching capabilities of the java.util.regex package. This BGrep class is a variant of the Unix “grep” command for searching files for text that matches a given regular expression. Unlike Unix grep, which is line-oriented, BGrep is block-oriented: the matched text can span multiple lines, and its location in the file is indicated by character number rather than line number. Invoke BGrep with the regular expression to search for and one or more filenames. Use -i to specify case-insensitive matching. If the files contain characters in some encoding other than UTF-8, use the -e option to specify the encoding. For example, you could use this command to search a bunch of Java source files for occurrences of “ByteBuffer”, “CharBuffer”, and the like.

java je3.nio.BGrep '[A-Z][a-z]*Buffer' *.java

The java.util.regex package uses a regular expression syntax that is much like that of Perl 5. Look up java.util.regex.Pattern in Sun’s javadocs or in Java in a Nutshell for a summary of this syntax, and look up the Matcher class in the same package for details on how to use Pattern objects to match character sequences. If you are not already familiar with regular expressions, you can find complete details in the book Mastering Regular Expressions, by Jeffrey Friedl (O’Reilly).

This program also demonstrates an easy way to read the contents of a file: simply use the memory-mapping capabilities of FileChannel to map the ...

Get Java Examples in a Nutshell, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.