Chapter 5. Words, Lines, and Special Characters
This chapter contains recipes that deal with finding and manipulating text in a variety of contexts. Some of the recipes show how to do things you might expect from an advanced search engine, such as finding any one of several words or finding words that appear near each other. Other examples help you find entire lines that contain particular words, remove repeated words, or escape regular expression metacharacters.
The central theme of this chapter is showing a variety of regular expression constructs and techniques in action. Reading through it is like a workout for a large number of regular expression syntax features, and will help you apply regular expressions generally to the problems you encounter. In many cases, what we search for is simple, but the templates we provide in the solutions allow you to customize them for the specific problems you’re facing.
5.1. Find a Specific Word
Problem
You’re given the simple task of finding all occurrences of the
word “cat”, case insensitively. The catch is that it must appear as a complete word. You
don’t want to find pieces of longer words, such as hellcat
, application
, or
Catwoman
.
Solution
Word boundary tokens make this a very easy problem to solve:
\bcat\b
Regex options: Case insensitive |
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Recipe 3.7 shows how you can use this regular expression to find all matches. Recipe 3.14 shows how you can replace matches with other text.
Discussion ...
Get Regular Expressions Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.