Credit: Brett Cannon
Python’s
time
module supplies the parsing function
strptime
only on some platforms, and not on
Windows. Example 17-2 shows a
strptime
function that is a pure Python implementation of the
time.strptime
function that comes with Python. It
is similar to how time.strptime
is documented in
the standard Python documentation. It accepts two more optional
arguments, as shown in the following signature:
strptime(string, format="%a %b %d %H:%M:%S %Y", option=AS_IS, locale_setting=ENGLISH)
option
’s default value of
AS_IS
gets time information from the string,
without any checking or filling-in. You can pass
option
as CHECK
, so that the
function makes sure that whatever information it gets is within
reasonable ranges (raising an exception otherwise), or
FILL_IN
(like CHECK
, but also
tries to fill in any missing information that can be computed).
locale_setting
accepts a locale tuple (as created
by LocaleAssembly
) to specify names of days,
months, and so on. Currently, ENGLISH
and
SWEDISH
locale tuples are built into this
recipe’s strptime
module.
Although this recipe’s strptime
cannot be as fast as the version in the standard Python library,
that’s hardly ever a major consideration for typical
strptime
use. This recipe does offer two
substantial advantages. It runs on any platform supporting Python and
gives perfectly identical results on different platforms, while
time.strptime
exists only on some platforms and
tends to have different quirks on each platform that supplies it. The
optional checking and filling-in of information that this recipe
provides is also quite handy.
The locale-setting support of this version of
strptime
was inspired by that in Andrew
Markebo’s own strptime
, which you
can find at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py.
However, this recipe has a more complete implementation of
strptime
’s specification that is
based on regular expressions, rather than relying on whitespace and
miscellaneous characters to split strings. For example, this recipe
can correctly parse strings based on a format such as
"%Y%m%d"
.
Example 17-2. Parsing a string into a date/time object portably
""" A pure-Python version of strptime. As close as possible to time.strptime's specs in the official Python docs. Locales supported via LocaleAssembly -- examples supplied for English and Swedish, follow the examples to add your own locales. Thanks to Andrew Markebo for his pure Python version of strptime, which convinced me to improve locale support -- and, of course, to Guido van Rossum and all other contributors to Python, the best language I've ever used! """ import re from exceptions import Exception _ _all_ _ = ['strptime', 'AS_IS', 'CHECK', 'FILL_IN', 'LocaleAssembly', 'ENGLISH', 'SWEDISH'] # metadata module _ _author_ _ = 'Brett Cannon' _ _email_ _ = 'drifty@bigfoot.com' _ _version_ _ = '1.5cb' _ _url_ _ = 'http://www.drifty.org/' # global settings and parameter constants CENTURY = 2000 AS_IS = 'AS_IS' CHECK = 'CHECK' FILL_IN = 'FILL_IN' def LocaleAssembly(DirectiveDict, MonthDict, DayDict, am_pmTuple): """ Creates locale tuple for use by strptime. Accepts arguments dictionaries DirectiveDict (locale-specific regexes for extracting info from time strings), MonthDict (locale-specific full and abbreviated month names), DayDict (locale-specific full and abbreviated weekday names), and the am_pmTuple tuple (locale-specific valid representations of AM and PM, as a two-item tuple). Look at how the ENGLISH dictionary is created for an example; make sure your dictionary has values corresponding to each entry in the ENGLISH dictionary. You can override any value in the BasicDict with an entry in DirectiveDict. """ BasicDict={'%d':r'(?P<d>[0-3]\d)', # Day of the month [01,31] '%H':r'(?P<H>[0-2]\d)', # Hour (24-h) [00,23] '%I':r'(?P<I>[01]\d)', # Hour (12-h) [01,12] '%j':r'(?P<j>[0-3]\d\d)', # Day of the year [001,366] '%m':r'(?P<m>[01]\d)', # Month [01,12] '%M':r'(?P<M>[0-5]\d)', # Minute [00,59] '%S':r'(?P<S>[0-6]\d)', # Second [00,61] '%U':r'(?P<U>[0-5]\d)', # Week in the year, Sunday first [00,53] '%w':r'(?P<w>[0-6])', # Weekday [0(Sunday),6] '%W':r'(?P<W>[0-5]\d)', # Week in the year, Monday first [00,53] '%y':r'(?P<y>\d\d)', # Year without century [00,99] '%Y':r'(?P<Y>\d\d\d\d)', # Year with century '%Z':r'(?P<Z>(\D+ Time)|([\S\D]{3,3}))', # Timezone name or empty '%%':r'(?P<percent>%)' # Literal "%" (ignored, in the end) } BasicDict.update(DirectiveDict) return BasicDict, MonthDict, DayDict, am_pmTuple # helper function to build locales' month and day dictionaries def _enum_with_abvs(start, *names): result = {} for i in range(len(names)): result[names[i]] = result[names[i][:3]] = i+start return result """ Built-in locales """ ENGLISH_Lang = ( {'%a':r'(?P<a>[^\s\d]{3,3})', # Abbreviated weekday name '%A':r'(?P<A>[^\s\d]{6,9})', # Full weekday name '%b':r'(?P<b>[^\s\d]{3,3})', # Abbreviated month name '%B':r'(?P<B>[^\s\d]{3,9})', # Full month name # Appropriate date and time representation. '%c':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d) ' r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)', '%p':r'(?P<p>(a|A|p|P)(m|M))', # Equivalent of either AM or PM # Appropriate date representation '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)', # Appropriate time representation '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'}, _enum_with_abvs(1, 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'), _enum_with_abvs(0, 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'), (('am','AM'),('pm','PM')) ) ENGLISH = LocaleAssembly(*ENGLISH_Lang) SWEDISH_Lang = ( {'%a':r'(?P<a>[^\s\d]{3,3})', '%A':r'(?P<A>[^\s\d]{6,7})', '%b':r'(?P<b>[^\s\d]{3,3})', '%B':r'(?P<B>[^\s\d]{3,8})', '%c':r'(?P<a>[^\s\d]{3,3}) (?P<d>[0-3]\d) ' r'(?P<b>[^\s\d]{3,3}) (?P<Y>\d\d\d\d) ' r'(?P<H>[0-2]\d):(?P<M>[0-5]\d):(?P<S>[0-6]\d)', '%p':r'(?P<p>(a|A|p|P)(m|M))', '%x':r'(?P<m>\d\d)/(?P<d>\d\d)/(?P<y>\d\d)', '%X':r'(?P<H>\d\d):(?P<M>\d\d):(?P<S>\d\d)'}, _enum_with_abvs(1, 'Januari', 'Februari', 'Mars', 'April', 'Maj', 'Juni', 'Juli', 'Augusti', 'September', 'Oktober', 'November', 'December'), _enum_with_abvs(0, 'Måndag', 'Tisdag', 'Onsdag', 'Torsdag', 'Fredag', 'Lördag', 'Söndag'), (('am','AM'),('pm','PM')) ) SWEDISH = LocaleAssembly(*SWEDISH_Lang) class StrptimeError(Exception): """ Exception class for the module """ def _ _init_ _(self, args=None): self.args = args def _g2j(y, m, d): """ Gregorian-to-Julian utility function, used by _StrpObj """ a = (14-m)/12 y = y+4800-a m = m+12*a-3 return d+((153*m+2)/5)+365*y+y/4-y/100+y/400-32045 class _StrpObj: """ An object with basic time-manipulation methods """ def _ _init_ _(self, year=None, month=None, day=None, hour=None, minute=None, second=None, day_week=None, julian_date=None, daylight=None): """ Sets up instances variables. All values can be set at initialization. Any info left out is automatically set to None. """ def _set_vars(_adict, **kwds): _adict.update(kwds) _set_vars(self._ _dict_ _, **vars( )) def julianFirst(self): """ Calculates the Julian date for the first day of year self.year """ return _g2j(self.year, 1, 1) def gregToJulian(self): """ Converts the Gregorian date to day within year (Jan 1 == 1) """ julian_day = _g2j(self.year, self.month, self.day) return julian_day-self.julianFirst( )+1 def julianToGreg(self): """ Converts the Julian date to the Gregorian date """ julian_day = self.julian_date+self.julianFirst( )-1 a = julian_day+32044 b = (4*a+3)/146097 c = a-((146097*b)/4) d = (4*c+3)/1461 e = c-((1461*d)/4) m = (5*e+2)/153 day = e-((153*m+2)/5)+1 month = m+3-12*(m/10) year = 100*b+d-4800+(m/10) return year, month, day def dayWeek(self): """ Figures out the day of the week using self.year, self.month, and self.day. Monday is 0. """ a = (14-self.month)/12 y = self.year-a m = self.month+12*a-2 day_week = (self.day+y+(y/4)-(y/100)+(y/400)+((31*m)/12))%7 if day_week==0: day_week = 6 else: day_week = day_week-1 return day_week def FillInInfo(self): """ Based on the current time information, it figures out what other info can be filled in. """ if self.julian_date is None and self.year and self.month and self.day: julian_date = self.gregToJulian( ) self.julian_date = julian_date if (self.month is None or self.day is None ) and self.year and self.julian_date: gregorian = self.julianToGreg( ) self.month = gregorian[1] # year ignored, must already be okay self.day = gregorian[2] if self.day_week is None and self.year and self.month and self.day: self.dayWeek( ) def CheckIntegrity(self): """ Checks info integrity based on the range that a number can be. Any invalid info raises StrptimeError. """ def _check(value, low, high, name): if value is not None and not low<value<high: raise StrptimeError, "%s incorrect"%name _check(self.month, 1, 12, 'Month') _check(self.day, 1, 31, 'Day') _check(self.hour, 0, 23, 'Hour') _check(self.minute, 0, 59, 'Minute') _check(self.second, 0, 61, 'Second') # 61 covers leap seconds _check(self.day_week, 0, 6, 'Day of the Week') _check(self.julian_date, 0, 366, 'Julian Date') _check(self.daylight, -1, 1, 'Daylight Savings') def return_time(self): """ Returns a tuple of numbers in the format used by time.gmtime( ). All instances of None in the information are replaced with 0. """ temp_time = (self.year, self.month, self.day, self.hour, self.minute, self.second, self.day_week, self.julian_date, self.daylight) return tuple([t or 0 for t in temp_time]) def RECreation(self, format, DIRECTIVEDict): """ Creates re based on format string and DIRECTIVEDict """ Directive = 0 REString = [] for char in format: if char=='%' and not Directive: Directive = 1 elif Directive: try: REString.append(DIRECTIVEDict['%'+char]) except KeyError: raise StrptimeError,"Invalid format %s"%char Directive = 0 else: REString.append(char) return re.compile(''.join(REString), re.IGNORECASE) def convert(self, string, format, locale_setting): """ Gets time info from string based on format string and a locale created by LocaleAssembly( ) """ DIRECTIVEDict, MONTHDict, DAYDict, AM_PM = locale_setting REComp = self.RECreation(format, DIRECTIVEDict) reobj = REComp.match(string) if reobj is None: raise StrptimeError,"Invalid string (%s)"%string for found in reobj.groupdict().keys( ): if found in 'y','Y': # year if found=='y': # without century self.year = CENTURY+int(reobj.group('y')) else: # with century self.year = int(reobj.group('Y')) elif found in 'b','B','m': # month if found=='m': # month number self.month = int(reobj.group(found)) else: # month name try: self.month = MONTHDict[reobj.group(found)] except KeyError: raise StrptimeError, 'Unrecognized month' elif found=='d': # day of the month self.day = int(reobj.group(found)) elif found in 'H','I': # hour hour = int(reobj.group(found)) if found=='H': # hour number self.hour = hour else: # AM/PM format try: if reobj.group('p') in AM_PM[0]: AP = 0 else: AP = 1 except KeyError: raise StrptimeError, 'Lacking needed AM/PM information' if AP: if hour==12: self.hour = 12 else: self.hour = 12+hour else: if hour==12: self.hour = 0 else: self.hour = hour elif found=='M': # minute self.minute = int(reobj.group(found)) elif found=='S': # second self.second = int(reobj.group(found)) elif found in 'a','A','w': # Day of the week if found=='w': # DOW number day_value = int(reobj.group(found)) if day_value==0: self.day_week = 6 else: self.day_week = day_value-1 else: # DOW name try: self.day_week = DAYDict[reobj.group(found)] except KeyError: raise StrptimeError, 'Unrecognized day' elif found=='j': # Julian date self.julian_date = int(reobj.group(found)) elif found=='Z': # daylight savings TZ = reobj.group(found) if len(TZ)==3: if TZ[1] in ('D','d'): self.daylight = 1 else: self.daylight = 0 elif TZ.find('Daylight')!=-1: self.daylight = 1 else: self.daylight = 0 def strptime(string, format='%a %b %d %H:%M:%S %Y', option=AS_IS, locale_setting=ENGLISH): """ Returns a tuple representing the time represented in 'string'. Valid values for 'options' are AS_IS, CHECK, and FILL_IN. 'locale_setting' accepts locale tuples created by LocaleAssembly( ). """ Obj = _StrpObj( ) Obj.convert(string, format, locale_setting) if option in FILL_IN,CHECK: Obj.CheckIntegrity( ) if option == FILL_IN: Obj.FillInInfo( ) return Obj.return_time( )
The most up-to-date version of strptime
is always
available at http://www.ocf.berkeley.edu/~bac/Askewed_Thoughts/HTML/code/index.php3#strptime,
where you will also find a test suite using
PyUnit
; Andrew Makebo’s version
of strptime
is at http://www.fukt.hk-r.se/~flognat/hacks/strptime.py.
Get Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.