Chapter 7. Handling Dates and Times
7.0 Introduction
Dates and times (datetimes) are frequently encountered during preprocessing for machine learning, whether the time of a particular sale or the year of some public health statistic. In this chapter, we will build a toolbox of strategies for handling time series data including tackling time zones and creating lagged time features. Specifically, we will focus on the time series tools in the pandas library, which centralizes the functionality of many other libraries.
7.1 Converting Strings to Dates
Problem
Given a vector of strings representing dates and times, you want to transform them into time series data.
Solution
Use pandas’ to_datetime
with the format of the date and/or time
specified in the format
parameter:
# Load libraries
import
numpy
as
np
import
pandas
as
pd
# Create strings
date_strings
=
np
.
array
([
'03-04-2005 11:35 PM'
,
'23-05-2010 12:01 AM'
,
'04-09-2009 09:09 PM'
])
# Convert to datetimes
[
pd
.
to_datetime
(
date
,
format
=
'
%d
-
%m
-
%Y
%I:
%M
%p'
)
for
date
in
date_strings
]
[Timestamp('2005-04-03 23:35:00'), Timestamp('2010-05-23 00:01:00'), Timestamp('2009-09-04 21:09:00')]
We might also want to add an argument to the errors
parameter to
handle problems:
# Convert to datetimes
[
pd
.
to_datetime
(
date
,
format
=
"
%d
-
%m
-
%Y
%I:
%M
%p"
,
errors
=
"coerce"
)
for
date
in
date_strings
]
[Timestamp('2005-04-03 23:35:00'), Timestamp('2010-05-23 00:01:00'), Timestamp('2009-09-04 21:09:00')]
If errors="coerce"
, then any problem that occurs ...
Get Machine Learning with Python Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.