In this chapter we discuss problems that might arise while you are preprocessing time series data. Some of these problems will be familiar to experienced data analysts, but there are specific difficulties posed by timestamps. As with any data analysis task, cleaning and properly processing data is often the most important step of a timestamp pipeline. Fancy techniques can’t fix messy data.
Most data analysts will need to find, align, scrub, and smooth their own data either to learn time series analysis or to do meaningful work in their organizations. As you prepare data, you’ll need to do a variety of tasks, from joining disparate columns to resampling irregular or missing data to aligning time series with different time axes. This chapter helps you along the path to an interesting and properly prepared time series data set.
We discuss the following skills useful for finding and cleaning up time series data:
Finding time series data from online repositories
Discovering and preparing time series data from sources not originally intended for time series
Addressing common conundrums you will encounter with time series data, especially the difficulties that arise from timestamps
After reading this chapter, you will have the skills needed to identify and prepare interesting sources of time series data for downstream analysis.
If you are interested in where to find time series data and how to clean it, ...