Chapter 5. Words, Lines, and Special Characters

This chapter contains recipes that deal with finding and manipulating text in a variety of contexts. Some of the recipes show how to do things you might expect from an advanced search engine, such as finding any one of several words or finding words that appear near each other. Other examples help you find entire lines that contain particular words, remove repeated words, or escape regular expression metacharacters.

The central theme of this chapter is showing a variety of regular expression constructs and techniques in action. Reading through it is like a workout for a large number of regular expression syntax features, and will help you apply regular expressions generally to the problems you encounter. In many cases, what we search for is simple, but the templates we provide in the solutions allow you to customize them for the specific problems you’re facing.

5.1. Find a Specific Word

Problem

You’re given the simple task of finding all occurrences of the word “cat”, case insensitively. The catch is that it must appear as a complete word. You don’t want to find pieces of longer words, such as hellcat, application, or Catwoman.

Solution

Word boundary tokens make this a very easy problem to solve:

\bcat\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Recipe 3.7 shows how you can use this regular expression to find all matches. Recipe 3.14 shows how you can replace matches with other text.

Discussion ...

Get Regular Expressions Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.