Splitting Up Complex Expressions

We've seen in some examples that regular expressions can quickly become quite complex. One way of battling complexity is to split expressions up into separate expressions. A good example of this is finding hyphens in ranges and replacing them with en dashes. As we will see, it's not easy to come up with a single expression to accomplish this.

Replace Hyphens in Page Ranges with En Dashes

The ranges we're after are ranges of numbers, both Arabic and Roman (e.g., 34–78, v–ix), ranges of numbers in parentheses (as in (23)–(27)), and certain letter ranges (a-d), which could be preceded by a number (as in 6b-d). The first type, ranges of Arabic numbers such as 23–56, can be handled with these expressions:

Find what: (?x) \b (\d+) - (\d+) \b

Change to: $1∼=$2

We need to specify word boundaries so that we catch only ranges with one hyphen. A search for \d-\d matches all kinds of things that are unlikely to be page ranges, such as ISBN numbers, telephone numbers, grant numbers—well, anything with more than one hyphen is not a page range.

Hyphens in ranges of Arabic numbers in parentheses such as (12)-(14) can be captured with the following expressions:

Find what: (?x) \b ([\d()]+) - ([\d()]+) \b

Change to: $1∼=$2

In order to match a number with its enclosing parentheses, we need to define a character class of \d, (, and ) (note again that the parentheses needn't be escaped in a character class).

Hyphens in Roman page ranges such as ii-xv are replaced with an en dash ...

Get GREP in InDesign CS3 now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.