Often, the value of time series data comes in its retrospective, rather than live streaming, scenarios. For this reason, storage of time series data is necessary for most time series analysis.
A good storage solution is one that enables ease of access and reliability of data without requiring a major investment of computing resources. In this chapter, we will discuss what aspects of a data set you should consider when designing for time series data storage. We will also discuss the advantages of SQL databases, NoSQL databases, and a variety of flat file formats.
Designing a general time series storage solution is a challenge because there are so many different kinds of time series data, each with different storage, read/write, and analysis patterns. Some data will be stored and examined repeatedly, whereas other data is useful only for a short period of time, after which it can be deleted altogether.
Here are a few use cases for time series storage that have different read, write, and query patterns:
You are collecting performance metrics on a production system. You need to store these performance metrics for years at a time, but the older the data gets, the less detailed it needs to be. Hence you need a form of storage that will automatically downsample and cull data as information ages.
You have access to a remote open source time series data repository, but you need to keep a local copy on your computer to cut down on network traffic. The ...