2.12. Repeat Part of the Regex a Certain Number of Times

Problem

Create regular expressions that match the following kinds of numbers:

  • A googol (a decimal number with 100 digits).

  • A 32-bit hexadecimal number.

  • A 32-bit hexadecimal number with an optional h suffix.

  • A floating-point number with an optional integer part, a mandatory fractional part, and an optional exponent. Each part allows any number of digits.

Solution

Googol

\b\d{100}\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal number

\b[a-f0-9]{1,8}\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hexadecimal number with optional suffix

\b[a-f0-9]{1,8}h?\b
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Floating-point number

\d*\.\d+(e\d+)?
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Discussion

Fixed repetition

The quantifier {n}, where n is a nonnegative integer, repeats the preceding regex token n number of times. The \d{100} in \b\d{100}\b matches a string of 100 digits. You could achieve the same by typing \d 100 times.

{1} repeats the preceding token once, as it would without any quantifier. ab{1}c is the same regex as abc.

{0} repeats the preceding token zero times, essentially deleting it from the regular expression. ab{0}c is the same regex as ac.

Variable repetition

For variable repetition, we use the quantifier {n,m}, where ...

Get Regular Expressions Cookbook, 2nd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.