4.6. Validate Traditional Time Formats

Problem

You want to validate times in various traditional time formats, such as hh:mm and hh:mm:ss in both 12-hour and 24-hour formats.

Solution

Hours and minutes, 12-hour clock:

^(1[0-2]|0?[1-9]):([0-5]?[0-9])(?[AP]M)?$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours and minutes, 24-hour clock:

^(2[0-3]|[01]?[0-9]):([0-5]?[0-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours, minutes, and seconds, 12-hour clock:

^(1[0-2]|0?[1-9]):([0-5]?[0-9]):([0-5]?[0-9])(?[AP]M)?$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours, minutes, and seconds, 24-hour clock:

^(2[0-3]|[01]?[0-9]):([0-5]?[0-9]):([0-5]?[0-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

The question marks in all of the preceding regular expressions make leading zeros optional. Remove the question marks to make leading zeros mandatory.

Discussion

Validating times is considerably easier than validating dates. Every hour has 60 minutes, and every minute has 60 seconds. This means we don’t need any complicated alternations in the regex. For the minutes and seconds, we don’t use alternation at all. [0-5]?[0-9] matches a digit between 0 and 5, followed by a digit between 0 and 9. This correctly matches any number between 0 and 59. The question mark after the first character class makes it optional. This way, a single digit between 0 and 9 is also accepted as a valid minute or second. Remove the question mark if the first 10 minutes and seconds should be written as 00 to 09. See Recipes 2.3 and 2.12 for details on character classes and quantifiers such as the question mark.

For the hours, we do need to use alternation (see Recipe 2.8). The second digit allows different ranges, depending on the first digit. On a 12-hour clock, if the first digit is 0, the second digit allows all 10 digits, but if the first digit is 1, the second digit must be 0, 1, or 2. In a regular expression, we write this as 1[0-2]|0?[1-9]. On a 24-hour clock, if the first digit is 0 or 1, the second digit allows all 10 digits, but if the first digit is 2, the second digit must be between 0 and 3. In regex syntax, this can be expressed as 2[0-3]|[01]?[0-9]. Again, the question mark allows the first 10 hours to be written with a single digit. Whether you’re working with a 12- or 24-hour clock, remove the question mark to require two digits.

We put parentheses around the parts of the regex that match the hours, minutes, and seconds. That makes it easy to retrieve the digits for the hours, minutes, and seconds, without the colons. Recipe 2.9 explains how parentheses create capturing groups. Recipe 3.9 explains how you can retrieve the text matched by those capturing groups in procedural code.

The parentheses around the hour part keeps two alternatives for the hour together. If you remove those parentheses, the regex won’t work correctly. Removing the parentheses around the minutes and seconds has no effect, other than making it impossible to retrieve their digits separately.

On a 12-hour clock, we allow the time to be followed by AM or PM. We also allow a space between the time and the AM/PM indicator. [AP]M matches AM or PM. ? matches an optional space. (?[AP]M)? groups the space and the indicator, and makes them optional as one unit. We don’t use ?([AP]M)? because that would allow a space even when the indicator is omitted.

Variations

If you want to search for times in larger bodies of text instead of checking whether the input as a whole is a time, you cannot use the anchors ^ and $. Merely removing the anchors from the regular expression is not the right solution. That would allow the hour and minute regexes to match 12:12 within 9912:1299, for instance. Instead of anchoring the regex match to the start and end of the subject, you have to specify that the time cannot be part of longer sequences of digits.

This is easily done with a pair of word boundaries. In regular expressions, digits are treated as characters that can be part of words. Replace both ^ and $ with \b. As an example:

\b(2[0-3]|[01]?[0-9]):([0-5]?[0-9])\b
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Word boundaries don’t disallow everything; they only disallow letters, digits and underscores. The regex just shown, which matches hours and minutes on a 24-hour clock, matches 16:08 within the subject text The time is 16:08:42 sharp. The space is not a word character, whereas the 1 is, so the word boundary matches between them. The 8 is a word character, whereas the colon isn’t, so \b also matches between those two.

If you want to disallow colons as well as word characters, you need to use lookaround (see Recipe 2.16), as shown in the following regex. Unlike before, this regex will not match any part of The time is 16:08:42 sharp. It only works with flavors that support lookbehind:

(?<![:\w])(2[0-3]|[01]?[0-9]):([0-5]?[0-9])(?![:\w])
Regex options: None
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby 1.9

See Also

This chapter has several other recipes for matching dates and times. Recipes 4.4 and 4.5 show how to validate traditional date formats. Recipe 4.7 shows how to validate date and time formats according to the ISO 8601 standard.

Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.6 explains word boundaries. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition. Recipe 2.16 explains lookaround.

Get Regular Expressions Cookbook, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.