5.7. Find Words Near Each Other


You want to emulate a NEAR search using a regular expression. For readers unfamiliar with the term, some search tools that use Boolean operators such as NOT and OR also have a special operator called NEAR. Searching for “word1 NEAR word2” finds word1 and word2 in any order, as long as they occur within a certain distance of each other.


If you’re searching for just two different words, you can combine two regular expressions—one that matches word1 before word2, and another that flips the order of the words. The following regex allows up to five words to separate the two you’re searching for:

Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
  word1                 # first term
  \W+ (?:\w+\W+){0,5}?  # up to five words
  word2                 # second term
|                       #   or, the same pattern in reverse:
  word2                 # second term
  \W+ (?:\w+\W+){0,5}?  # up to five words
  word1                 # first term
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby

The second regular expression here uses the free-spacing option and adds whitespace and comments for readability. Apart from that, the two regular expressions are identical. JavaScript doesn’t support free-spacing mode unless you use the XRegExp library, but the other listed regex flavors allow you to take your pick. Recipes 3.5 and 3.7 show examples of how you can add these regular expressions ...

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.