O'Reilly logo

Regular Expressions Cookbook by Steven Levithan, Jan Goyvaerts

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

4.7. Validate ISO 8601 Dates and Times

Problem

You want to match dates and/or times in the official ISO 8601 format, which is the basis for many standardized date and time formats. For example, in XML Schema, the built-in date, time, and dateTime types are all based on ISO 8601.

Solution

The following matches a calendar month, e.g., 2008-08. The hyphen is required:

^([0-9]{4})-(1[0-2]|0[1-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-(?<month>1[0-2]|0[1-9])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9
^(?P<year>[0-9]{4})-(?P<month>1[0-2]|0[1-9])$
Regex options: None
Regex flavors: PCRE, Python

Calendar date, e.g., 2008-08-30. The hyphens are optional. This regex allows YYYY-MMDD and YYYYMM-DD, which do not follow ISO 8601:

^([0-9]{4})-?(1[0-2]|0[1-9])-?(3[0-1]|0[1-9]|[1-2][0-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?(?<month>1[0-2]|0[1-9])-?↵
(?<day>3[0-1]|0[1-9]|[1-2][0-9])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Calendar date, e.g., 2008-08-30. The hyphens are optional. This regex uses a conditional to exclude YYYY-MMDD and YYYYMM-DD. There is an extra capturing group for the first hyphen:

^([0-9]{4})(-)?(1[0-2]|0[1-9])(?(2)-)(3[0-1]|0[1-9]|[1-2][0-9])$
Regex options: None
Regex flavors: .NET, PCRE, Perl, Python

Calendar date, e.g., 2008-08-30. The hyphens are optional. This regex uses alternation to exclude YYYY-MMDD and YYYYMM-DD. There are two capturing groups for the month:

^([0-9]{4})(?:(1[0-2]|0[1-9])|-?(1[0-2]|0[1-9])-?)↵
(3[0-1]|0[1-9]|[1-2][0-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Week of the year, e.g., 2008-W35. The hyphen is optional:

^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?W(?<week>5[0-3]|[1-4][0-9]|0[1-9])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Week date, e.g., 2008-W35-6. The hyphens are optional:

^([0-9]{4})-?W(5[0-3]|[1-4][0-9]|0[1-9])-?([1-7])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?W(?<week>5[0-3]|[1-4][0-9]|0[1-9])-?(?<day>[1-7])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Ordinal date, e.g., 2008-243. The hyphen is optional:

^([0-9]{4})-?(36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>[0-9]{4})-?↵
(?<day>36[0-6]|3[0-5][0-9]|[12][0-9]{2}|0[1-9][0-9]|00[1-9])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Hours and minutes, e.g., 17:21. The colon is optional:

^(2[0-3]|[01]?[0-9]):?([0-5]?[0-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[01]?[0-9]):?(?<minute>[0-5]?[0-9])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Hours, minutes, and seconds, e.g., 17:21:59. The colons are optional:

^(2[0-3]|[01]?[0-9]):?([0-5]?[0-9]):?([0-5]?[0-9])$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[01]?[0-9]):?(?<minute>[0-5]?[0-9]):?↵
(?<second>[0-5]?[0-9])$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Time zone designator, e.g., Z, +07 or +07:00. The colons and the minutes are optional:

^(Z|[+-](?:2[0-3]|[01]?[0-9])(?::?(?:[0-5]?[0-9]))?)$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby

Hours, minutes, and seconds with time zone designator, e.g., 17:21:59+07:00. All the colons are optional. The minutes in the time zone designator are also optional:

^(2[0-3]|[01]?[0-9]):?([0-5]?[0-9]):?([0-5]?[0-9])↵
(Z|[+-](?:2[0-3]|[01]?[0-9])(?::?(?:[0-5]?[0-9]))?)$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[01]?[0-9]):?(?<minute>[0-5]?[0-9]):?(?<sec>[0-5]?[0-9])↵
(?<timezone>Z|[+-](?:2[0-3]|[01]?[0-9])(?::?(?:[0-5]?[0-9]))?)$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Date, with optional time zone, e.g., 2008-08-30 or 2008-08-30+07:00. Hyphens are required. This is the XML Schema date type:

^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[0-1]|0[1-9]|[1-2][0-9])↵
(Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>-?(?:[1-9][0-9]*)?[0-9]{4})-(?<month>1[0-2]|0[1-9])-↵
(?<day>3[0-1]|0[1-9]|[1-2][0-9])↵
(?<timezone>Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Time, with optional microseconds and time zone, e.g., 01:45:36 or 01:45:36.123+07:00. This is the XML Schema dateTime type:

^(2[0-3]|[0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\.[0-9]+)?↵
(Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<hour>2[0-3]|[0-1][0-9]):(?<minute>[0-5][0-9]):(?<second>[0-5][0-9])↵
(?<ms>\.[0-9]+)?(?<timezone>Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Date and time, with optional microseconds and time zone, e.g., 2008-08-30T01:45:36 or 2008-08-30T01:45:36.123Z. This is the XML Schema dateTime type:

^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[0-1]|0[1-9]|[1-2][0-9])↵
T(2[0-3]|[0-1][0-9]):([0-5][0-9]):([0-5][0-9])(\.[0-9]+)?↵
(Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$
Regex options: None
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby
^(?<year>-?(?:[1-9][0-9]*)?[0-9]{4})-(?<month>1[0-2]|0[1-9])-↵
(?<day>3[0-1]|0[1-9]|[1-2][0-9])T(?<hour>2[0-3]|[0-1][0-9]):↵
(?<minute>[0-5][0-9]):(?<second>[0-5][0-9])(?<ms>\.[0-9]+)?↵
(?<timezone>Z|[+-](?:2[0-3]|[0-1][0-9]):[0-5][0-9])?$
Regex options: None
Regex flavors: .NET, PCRE 7, Perl 5.10, Ruby 1.9

Discussion

ISO 8601 defines a wide range of date and time formats. The regular expressions presented here cover the most common formats, but most systems that use ISO 8601 use only a subset. For example, in XML Schema dates and times, the hyphens and colons are mandatory. To make hyphens and colons mandatory, simply remove the question marks after them. To disallow hyphens and colons, remove the hyphens and colons along with the question mark that follows them. Do watch out for the noncapturing groups, which use the (?:group) syntax. If a question mark and a colon follow an opening parenthesis, those three characters open a noncapturing group.

The regular expressions make the individual hyphens and colons optional, which does not follow ISO 8601 exactly. For example, 1733:26 is not a valid ISO 8601 time, but will be accepted by the time regexes. Requiring all hyphens and colons to be present or omitted at the same time makes your regex quite a bit more complex. We’ve done this as an example for the date regex, but in practice, as with the XML Schema types, the delimiters are usually required or disallowed rather than optional.

We put parentheses around all the number parts of the regex. That makes it easy to retrieve the numbers for the years, months, days, hours, minutes, seconds, and time zones. Recipe 2.9 explains how parentheses create capturing groups. Recipe 3.9 explains how you can retrieve the text matched by those capturing groups in procedural code.

For most regexes, we also show an alternative using named capture. Some of these date and time formats may be unfamiliar to you or your fellow developers. Named capture makes the regex easier to understand. .NET, PCRE 7, Perl 5.10, and Ruby 1.9 support the (?<name>group) syntax. All versions of PCRE and Python covered in this book support the alternative (?P<name>group) syntax, which adds a P. See Recipe 2.11 and Recipe 3.9 for details.

The number ranges in all the regexes are strict. For example, the calendar day is restricted between 01 and 31. You’ll never end up with day 32 or month 13. None of the regexes here attempt to exclude invalid day and month combinations, such as February 31st; Recipe 4.5 explains how you can deal with that.

Though some of these regexes are quite long, they’re all very straightforward and use the same techniques explained in Recipe 4.4 and Recipe 4.6.

See Also

Recipes 4.4, 4.5, 4.6

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required