Chapter 7. Handling Dates and Times
7.0 Introduction
Dates and times (datetimes), such as the time of a particular sale or the date of a public health statistic, are frequently encountered during preprocessing for machine learning. Longitudinal data (or time series data) is data that’s collected repeatedly for the same variables over points in time. In this chapter, we will build a toolbox of strategies for handling time series data, including tackling time zones and creating lagged time features. Specifically, we will focus on the time series tools in the pandas library, which centralizes the functionality of many other general libraries such as datetime
.
7.1 Converting Strings to Dates
Problem
Given a vector of strings representing dates and times, you want to transform them into time series data.
Solution
Use pandas’ to_datetime
with the format of the date and/or time
specified in the format
parameter:
# Load libraries
import
numpy
as
np
import
pandas
as
pd
# Create strings
date_strings
=
np
.
array
([
'03-04-2005 11:35 PM'
,
'23-05-2010 12:01 AM'
,
'04-09-2009 09:09 PM'
])
# Convert to datetimes
[
pd
.
to_datetime
(
date
,
format
=
'
%d
-
%m
-
%Y
%I:
%M
%p'
)
for
date
in
date_strings
]
[Timestamp('2005-04-03 23:35:00'), Timestamp('2010-05-23 00:01:00'), Timestamp('2009-09-04 21:09:00')]
We might also want to add an argument to the errors
parameter to
handle problems:
# Convert to datetimes
[
pd
.
to_datetime
(
date
,
format
=
"
%d
-
%m
-
%Y
%I:
%M
%p"
,
errors
=
"coerce"
)
for
date
in
date_strings
]
[Timestamp('2005-04-03 ...
Get Machine Learning with Python Cookbook, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.