Chapter 5. Storing Temporal Data
Often, the value of time series data comes in its retrospective, rather than live streaming, scenarios. For this reason, storage of time series data is necessary for most time series analysis.
A good storage solution is one that enables ease of access and reliability of data without requiring a major investment of computing resources. In this chapter, we will discuss what aspects of a data set you should consider when designing for time series data storage. We will also discuss the advantages of SQL databases, NoSQL databases, and a variety of flat file formats.
Designing a general time series storage solution is a challenge because there are so many different kinds of time series data, each with different storage, read/write, and analysis patterns. Some data will be stored and examined repeatedly, whereas other data is useful only for a short period of time, after which it can be deleted altogether.
Here are a few use cases for time series storage that have different read, write, and query patterns:
You are collecting performance metrics on a production system. You need to store these performance metrics for years at a time, but the older the data gets, the less detailed it needs to be. Hence you need a form of storage that will automatically downsample and cull data as information ages.
You have access to a remote open source time series data repository, but you need to keep a local copy on your computer to cut down on network traffic. The ...