Chapter 7. Handling Dates and Times
7.0 Introduction
Dates and times (datetimes) are frequently encountered during preprocessing for machine learning, whether the time of a particular sale or the year of some public health statistic. In this chapter, we will build a toolbox of strategies for handling time series data including tackling time zones and creating lagged time features. Specifically, we will focus on the time series tools in the pandas library, which centralizes the functionality of many other libraries.
7.1 Converting Strings to Dates
Problem
Given a vector of strings representing dates and times, you want to transform them into time series data.
Solution
Use pandas’ to_datetime with the format of the date and/or time
specified in the format parameter:
# Load librariesimportnumpyasnpimportpandasaspd# Create stringsdate_strings=np.array(['03-04-2005 11:35 PM','23-05-2010 12:01 AM','04-09-2009 09:09 PM'])# Convert to datetimes[pd.to_datetime(date,format='%d-%m-%Y%I:%M%p')fordateindate_strings]
[Timestamp('2005-04-03 23:35:00'),
Timestamp('2010-05-23 00:01:00'),
Timestamp('2009-09-04 21:09:00')]
We might also want to add an argument to the errors parameter to
handle problems:
# Convert to datetimes[pd.to_datetime(date,format="%d-%m-%Y%I:%M%p",errors="coerce")fordateindate_strings]
[Timestamp('2005-04-03 23:35:00'),
Timestamp('2010-05-23 00:01:00'),
Timestamp('2009-09-04 21:09:00')]
If errors="coerce", then any problem that occurs ...