CHAPTER 8Date and Time Functions

Data scientists use date and time functions many different ways in our queries. We may use two dates to calculate a duration, for example. Many machine learning algorithms are “trained” to identify patterns in data from the past and use those patterns to predict future outcomes. In order to build a dataset for that purpose, we have to be able to filter queries by time range.

Often, datasets that are built for predictive models include summaries of activities within dynamic date ranges—for example, a count of some activity occurrence during each of the past three months. Or, in the case of time-series analysis, an input dataset might include one row per time period (hour, day, week, month) with a count of something associated with each time period; for example, the number of patients a doctor sees per week.

Many predictive models are time-bound. For example, the question “Will this first-time customer become a repeat customer?” will be further refined as “What is the likelihood that each first-time customer at today's farmer’s market will return and make a second purchase within the next month?” To answer this question, we could create a dataset with a row for every customer, columns containing data values as of the time of their first purchase, and a binary “target variable” that indicates whether that customer made another purchase within a month of their first purchase date.

Let's look at some different ways to work with date and time values ...

Get SQL for Data Scientists now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.