Chapter 5. Words, Lines, and Special Characters

This chapter contains recipes that deal with finding and manipulating text in a variety of contexts. Some of the recipes show how to do things you might expect from an advanced search engine, such as finding any one of several words or finding words that appear near each other. Other examples help you find entire lines that contain particular words, remove repeated words, or escape regular expression metacharacters.

The central theme of this chapter is showing a variety of regular expression constructs and techniques in action. Reading through it is like a workout for a large number of regular expression syntax features, and will help you apply regular expressions generally to the problems you encounter. In many cases, what we search for is simple, but the templates we provide in the solutions allow you to customize them for the specific problems you’re facing.

5.1. Find a Specific Word


You’re given the simple task of finding all occurrences of the word “cat”, case insensitively. The catch is that it must appear as a complete word. You don’t want to find pieces of longer words, such as hellcat, application, or Catwoman.


Word boundary tokens make this a very easy problem to solve:

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Recipe 3.7 shows how you can use this regular expression to find all matches. Recipe 3.14 shows how you can replace matches with other text.

Discussion ...

Get Regular Expressions Cookbook now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.