O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Java 9 Regular Expressions

Book Description

Solve real world problems using Regex in Java.

About This Book

  • Discover regular expressions and how they work
  • Implement regular expressions with Java to your code base
  • Learn to use regular expressions in emails, URLs, paths, and IP addresses

Who This Book Is For

This book is for Java developers who would like to understand and use regular expressions. A basic knowledge of Java is assumed.

What You Will Learn

  • Understand the semantics, rules, and core concepts of writing Java code involving regular expressions
  • Learn about the java.util.Regex package using the Pattern class, Matcher class, code snippets, and more
  • Match and capture text in regex and use back-references to the captured groups
  • Explore Regex using Java String methods and regex capabilities in the Java Scanner API
  • Use zero-width assertions and lookarounds in regex
  • Test and optimize a poorly performing regex and various other performance tips

In Detail

Regular expressions are a powerful tool in the programmer's toolbox and allow pattern matching. They are also used for manipulating text and data. This book will provide you with the know-how (and practical examples) to solve real-world problems using regex in Java.

You will begin by discovering what regular expressions are and how they work with Java. This easy-to-follow guide is a great place from which to familiarize yourself with the core concepts of regular expressions and to master its implementation with the features of Java 9. You will learn how to match, extract, and transform text by matching specific words, characters, and patterns. You will learn when and where to apply the methods for finding patterns in digits, letters, Unicode characters, and string literals. Going forward, you will learn to use zero-length assertions and lookarounds, parsing the source code, and processing the log files. Finally, you will master tips, tricks, and best practices in regex with Java.

Style and approach

This book will take readers through this learning journey using simple, easy-to-understand, step-by-step instructions and hands-on examples at every stage.

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Errata
      3. Piracy
      4. Questions
  2. Getting Started with Regular Expressions
    1. Introduction to regular expressions
      1. A bit of history of regular expressions
      2. Various flavors of regular expressions
      3. What type of problems need regular expressions to solve
      4. The basic rules of regular expressions
      5. Constructs of the standard regular expression and meta characters
      6. Some basic regular expression examples
      7. Eager matching
        1. The effect of eager matching on regular expression alternation
    2. Summary
  3. Understanding the Core Constructs of Java Regular Expressions
    1. Understanding the core constructs of regular expressions
    2. Quantifiers
      1. Basic quantifiers
        1. Examples using quantifiers
      2. Greedy versus reluctant (lazy) matching using quantifiers
      3. Possessive quantifiers
      4. Boundary constructs
        1. Examples using boundary constructs
      5. Character classes
        1. Examples of character classes
        2. Range inside a character class
          1. Examples of character range
      6. Escaping special regex metacharacters and escaping rules inside the character classes
        1. Escaping inside a character class
          1. Examples of escaping rules inside the character class
        2. Literally matching a string that may contain special regex metacharacters
        3. Negated character classes
          1. Examples of negated character classes
      7. Predefined shorthand character classes
        1. POSIX character classes
      8. Unicode support in Java regular expressions
        1. Commonly used Unicode character properties
        2. Negation of the preceding regex directives
        3. Unicode scripts support
          1. Examples of matching Unicode text in regular expressions
        4. Double escaping in a Java String when defining regular expressions
        5. Embedded regular expression mode modifiers
        6. The placement of embedded modes in a Java regular expression
        7. Disabling mode modifiers
    3. Summary
  4. Working with Groups, Capturing, and References
    1. Capturing groups
      1. Group numbering
      2. Named groups
    2. Non-capturing groups
      1. Advantages of non-capturing groups
    3. Back references
      1. Back reference of a named group
      2. Replacement reference of a named group
      3. Forward references
      4. Invalid (non-existing) backward or forward references
    4. Summary
  5. Regular Expression Programming Using Java String and Scanner APIs
    1. Introduction to the Java String API for regular expressions' evaluation
      1. Method - boolean matches(String regex)
        1. Example of the matches method
      2. Method - String replaceAll(String regex, String replacement)
        1. Examples of the replaceAll method
      3. Method - String replaceFirst(String regex, String replacement)
        1. Examples of the replaceFirst method
      4. Methods - String split methods
        1. The limit parameter rules
        2. Examples of the split method
          1. Example of the split method using the limit parameter
    2. Using regular expressions in Java Scanner API
    3. Summary
  6. Introduction to Java Regular Expression APIs - Pattern and Matcher Classes
    1. The MatchResult interface
    2. The Pattern class
      1. Examples using the Pattern class
        1. Filtering a list of tokens using the asPredicate() method
    3. The Matcher class
      1. Examples using the Matcher class
        1. Method Boolean lookingAt()
        2. The matches() method
        3. The find() and find(int start) methods
      2. The appendReplacement(StringBuffer sb, String replacement) method
      3. The appendTail(StringBuffer sb) method
      4. Example of the appendReplacement and appendTail methods
    4. Summary
  7. Exploring Zero-Width Assertions, Lookarounds, and Atomic Groups
    1. Zero-width assertions
      1. Predefined zero-width assertions
      2. Regex defined zero-width assertions
    2. \G boundary assertion
    3. Atomic groups
    4. Lookahead assertions
      1. Positive lookahead
      2. Negative lookahead
    5. Lookbehind assertions
      1. Positive lookbehind
      2. Negative lookbehind
      3. Capturing text from overlapping matches
      4. Be careful with capturing groups inside a lookahead or lookbehind atomic group
        1. Lookbehind limitations in Java regular expressions
    6. Summary
  8. Understanding the Union, Intersection, and Subtraction of Character Classes
    1. The union of character classes
    2. The intersection of character classes
    3. The subtraction of character classes
    4. Why should you use composite character classes?
    5. Summary
  9. Regular Expression Pitfalls, Optimization, and Performance Improvements
    1. Common pitfalls and ways to avoid them while writing regular expressions
      1. Do not forget to escape regex metacharacters outside a character class
      2. Avoid escaping every non-word character
      3. Avoid unnecessary capturing groups to reduce memory consumption
      4. However, don't forget to use the required group around alternation
      5. Use predefined character classes instead of longer versions
      6. Use the limiting quantifier instead of repeating a character or pattern multiple times
      7. Do not use an unescaped hyphen in the middle of a character class
      8. The mistake of calling matcher.goup() without a prior call to matcher.find(), matcher.matches(), or matcher.lookingAt()
      9. Do not use regular expressions to parse XML / HTML data
    2. How to test and benchmark your regular expression performance
    3. Catastrophic or exponential backtracking
      1. How to avoid catastrophic backtracking
    4. Optimization and performance enhancement tips
      1. Use a compiled form of regular expressions
      2. Use a negated character class instead of the greedy and slow .* or .+
      3. Avoid unnecessary grouping
      4. Use lazy quantifiers strategically instead of greedy quantifiers that cause excessive backtracking
      5. Make use of possessive quantifiers to avoid backtracking
      6. Extract common repeating substrings out of alternation
      7. Use atomic group to avoid backtracking and fail fast
    5. Summary