Regular Expressions and Character Decoding
Example 6-3
demonstrates the text-matching capabilities of the java.util.regex
package. This BGrep
class is a variant of the Unix “grep” command for searching files for
text that matches a given regular expression. Unlike Unix grep, which is line-oriented,
BGrep
is block-oriented: the
matched text can span multiple lines, and its location in the file is
indicated by character number rather than line number. Invoke BGrep
with the regular expression to search
for and one or more filenames. Use -i
to specify case-insensitive matching. If
the files contain characters in some encoding other than UTF-8, use
the -e
option to specify the
encoding. For example, you could use this command to search a bunch of
Java source files for occurrences of “ByteBuffer”, “CharBuffer”, and
the like.
java je3.nio.BGrep '[A-Z][a-z]*Buffer' *.java
The java.util.regex
package
uses a regular expression syntax that is much like that of Perl 5.
Look up java.util.regex.Pattern
in
Sun’s javadocs or in Java in a Nutshell for a
summary of this syntax, and look up the Matcher
class in the same package for
details on how to use Pattern
objects to match character sequences. If you are not already familiar
with regular expressions, you can find complete details in the book
Mastering Regular Expressions, by Jeffrey Friedl
(O’Reilly).
This program also demonstrates an easy way to read the contents
of a file: simply use the memory-mapping capabilities of FileChannel
to map the ...
Get Java Examples in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.