Chapter 6. Time Series Analysis with pandas

A time series is a series of data points along a time-based axis that plays a central role in many different scenarios: while traders use historical stock prices to calculate risk measures, the weather forecast is based on time series generated by sensors measuring temperature, humidity, and air pressure. And the digital marketing department relies on time series generated by web pages, e.g., the source and number of page views per hour, and will use them to draw conclusions with regard to their marketing campaigns.

Time series analysis is one of the main driving forces why data scientists and analysts have started to look for a better alternative to Excel. The following points summarize some of the reasons behind this move:

Big datasets

Time series can quickly grow beyond Excel’s limit of roughly one million rows per sheet. For example, if you work with intraday stock prices on a tick data level, you’re often dealing with hundreds of thousands of records—per stock and day!

Date and time

As we have seen in Chapter 3, Excel has various limitations when it comes to handling date and time, the backbone of time series. Missing support for time zones and a number format that is limited to milliseconds are some of them. pandas supports time zones and uses NumPy’s datetime64[ns] data type, which offers a resolution in up to nanoseconds.

Missing functionality

Excel misses even basic tools to be able to work with time series data in a decent ...

Get Python for Excel now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.