Chapter 7. Handling Dates and Times
7.0 Introduction
Dates and times (datetimes), such as the time of a particular sale or the date of a public health statistic, are frequently encountered during preprocessing for machine learning. Longitudinal data (or time series data) is data that’s collected repeatedly for the same variables over points in time. In this chapter, we will build a toolbox of strategies for handling time series data, including tackling time zones and creating lagged time features. Specifically, we will focus on the time series tools in the pandas library, which centralizes the functionality of many other general libraries such as datetime.
7.1 Converting Strings to Dates
Problem
Given a vector of strings representing dates and times, you want to transform them into time series data.
Solution
Use pandas’ to_datetime with the format of the date and/or time
specified in the format parameter:
# Load librariesimportnumpyasnpimportpandasaspd# Create stringsdate_strings=np.array(['03-04-2005 11:35 PM','23-05-2010 12:01 AM','04-09-2009 09:09 PM'])# Convert to datetimes[pd.to_datetime(date,format='%d-%m-%Y%I:%M%p')fordateindate_strings]
[Timestamp('2005-04-03 23:35:00'),
Timestamp('2010-05-23 00:01:00'),
Timestamp('2009-09-04 21:09:00')]
We might also want to add an argument to the errors parameter to
handle problems:
# Convert to datetimes[pd.to_datetime(date,format="%d-%m-%Y%I:%M%p",errors="coerce")fordateindate_strings]
[Timestamp('2005-04-03 ...Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access