Chapter 1. Time Series: An Overview and a Quick History
Time series data and its analysis are increasingly important due to the massive production of such data through, for example, the internet of things, the digitalization of healthcare, and the rise of smart cities. In the coming years we can expect the quantity, quality, and importance of time series data to grow rapidly.
As continuous monitoring and data collection become more common, the need for competent time series analysis with both statistical and machine learning techniques will increase. Indeed, the most promising new models combine both of these methodologies. For this reason, we will discuss each at length. We will study and use a broad range of time series techniques useful for analyzing and predicting human behavior, scientific phenomena, and private sector data, as all these areas offer rich arrays of time series data.
Let’s start with a definition. Time series analysis is the endeavor of extracting meaningful summary and statistical information from points arranged in chronological order. It is done to diagnose past behavior as well as to predict future behavior. In this book we will use a variety of approaches, ranging from hundredyearold statistical models to newly developed neural network architectures.
None of the techniques has developed in a vacuum or out of purely theoretical interest. Innovations in time series analysis result from new ways of collecting, recording, and visualizing data. Next we briefly discuss the emergence of time series analysis in a variety of applications.
The History of Time Series in Diverse Applications
Time series analysis often comes down to the question of causality: how did the past influence the future? At times, such questions (and their answers) are treated strictly within their discipline rather than as part of the general discipline of time series analysis. As a result, a variety of disciplines have contributed novel ways of thinking about time series data sets.
In this section we will survey a few historical examples of time series data and analysis in these disciplines:

Medicine

Weather

Economics

Astronomy
As we will see, the pace of development in these disciplines and the contributions originating in each field were strongly tied to the nature of the contemporaneous time series data available.
Medicine as a Time Series Problem
Medicine is a datadriven field that has contributed interesting time series analysis to human knowledge for a few centuries. Now, let’s study a few examples of time series data sources in medicine and how they emerged over time.
Medicine got a surprisingly slow start to thinking about the mathematics of predicting the future, despite the fact that prognoses are an essential part of medical practice. This was the case for many reasons. Statistics and a probabilistic way of thinking about the world are recent phenomena, and these disciplines were not available for many centuries even as the practice of medicine developed. Also, most doctors practiced in isolation, without easy professional communication and without a formal recordkeeping infrastructure for patient or population health. Hence, even if physicians in earlier times had been trained as statistical thinkers, they likely wouldn’t have had reasonable data from which to draw conclusions.
This is not at all to criticize early physicians but to explain why it is not too surprising that one of the early time series innovations in population health came from a seller of hats rather than from a physician. When you think about it, this makes sense: in earlier centuries, an urban hat seller would likely have had more practice in recordkeeping and the art of spotting trends than would a physician.
The innovator was John Graunt, a 17thcentury London haberdasher. Graunt undertook a study of the death records that had been kept in London parishes since the early 1500s. In doing so, he originated the discipline of demography. In 1662, he published Natural and Political Observations . . . Made upon the Bills of Mortality (See Figure 11).
In this book, Graunt presented the first life tables, which you may know as actuarial tables. These tables show the probability that a person of a given age will die before their next birthday. Graunt, as the first person known to have formulated and published life tables, was also the first documented statistician of human health. His life tables looked something like Table 11, which is taken from some Rice University statistics course notes.
Age  Proportion of deaths in the interval  Proportion surviving until start of interval 

0–6  0.36  1.0 
7–16  0.24  0.64 
17–26  0.15  0.40 
27–36  0.09  0.25 
Unfortunately, Graunt’s way of thinking mathematically about human survival did not take. A more connected and datadriven world began to form—complete with nation states, accreditation, professional societies, scientific journals, and, much later, governmentmandated health recordkeeping—but medicine continued to focus on physiology rather than statistics.
There were understandable reasons for this. First, the study of anatomy and physiology in small numbers of subjects had provided the major advances in medicine for centuries, and most humans (even scientists) hew to what works for them as long as possible. While a focus on physiology was so successful, there was no reason to look further afield. Second, there was very little reporting infrastructure in place for physicians to tabulate and share information on the scale that would make statistical methods superior to clinical observations.
Time series analysis has been even slower to come into mainstream medicine than other branches of statistics and data analysis, likely because time series analysis is more demanding of recordkeeping systems. Records must be linked together over time, and preferably collected at regular intervals. For this reason, time series as an epidemiological practice has only emerged very recently and incrementally, once sufficient governmental and scientific infrastructure was in place to ensure reasonably good and lengthy temporal records.
Likewise, individualized healthcare using time series analysis remains a young and challenging field because it can be quite difficult to create data sets that are consistent over time. Even for small casestudybased research, maintaining both contact with and participation from a group of individuals is excruciatingly difficult and expensive. When such studies are conducted for long periods of time, they tend to become canonical in their fields—and repeatedly, or even excessively researched—because their data can address important questions despite the challenges of funding and management.^{1}
Medical instruments
Time series analysis for individual patients has a far earlier and more successful history than that of populationlevel health studies. Time series analysis made its way into medicine when the first practical electrocardiograms (ECGs), which can diagnose cardiac conditions by recording the electrical signals passing through the heart, were invented in 1901 (see Figure 12). Another time series machine, the electroencephalogram (EEG), which noninvasively measures electrical impulses in the brain, was introduced into medicine in 1924, creating more opportunities for medical practitioners to apply time series analysis to medical diagnosis (see Figure 13).
Both of these time series machines were part of a larger trend of enhancing medicine with repurposed ideas and technologies coming out of the second Industrial Revolution.
ECG and EEG time series classification tools remain active areas of research for very practical purposes, such as estimating the risk of a sudden cardiac crisis or a seizure. These measurements are rich sources of data, but one “problem” with such data is that it tends to be available only for patients with specific ailments. These machines do not generate longrange time series that can tell us more broadly about human health and behavior, as their measurements are seldom applied for long periods of time or before a disease has emerged in a patient.
Luckily, from a data analysis point of view, we are moving past the era where ECGs and the like are the dominant medical time series available. With the advent of wearable sensors and “smart” electronic medical devices, many healthy humans take routine measurements automatically or with minimal manual input, leading to the ongoing collection of good longitudinal data about both sick and healthy people. This is in stark contrast to the last century’s medical time series data, which was almost exclusively measured on sick people and which was very limited in access.
As recent news coverage has shown, a variety of nontraditional players are entering the medical field, ranging from enormous social media companies to financial institutions to retail giants.^{2} They likely all plan to use large data sets to streamline healthcare. There aren’t just new players in the healthcare field—there are also new techniques. The personalized DNAdriven medicine means that time series data is increasingly measured and valued. Thanks to burgeoning modern healthcare data sets, both healthcare and time series analysis will likely evolve in the coming years, particularly in response to the lucrative data sets of the healthcare sector. Hopefully this will happen in such a way that time series can benefit everyone.
Forecasting Weather
For obvious reasons, predicting the weather has long been a preoccupation to many. The ancient Greek philosopher Aristotle delved into weather with an entire treatise (Meteorology), and his ideas about the causes and sequencing of the weather remained dominant until the Renaissance. At that time, scientists began to collect weatherrelated data with the help of newly invented instruments, such as the barometer, to measure the state of the atmosphere. They used these instruments to record time series at daily or even hourly intervals. The records were kept in a variety of locations, including private diaries and local town logbooks. For centuries this remained the only way that Western civilization tracked the weather.
Greater formalization and infrastructure for weather recording arrived in the 1850s when Robert FitzRoy was appointed the head of a new British government department to record and publish weatherrelated data for sailors.^{3} FitzRoy coined the term weather forecast. At the time, he was criticized for the quality of his forecasts, but he is now regarded to have been well ahead of his time in the science he used to develop them. He established the custom of printing weather forecasts in the newspaper; they were the first forecasts printed in The Times of London. FitzRoy is now celebrated as the “father of forecasting.”
In the late 19th century—hundreds of years after many atmospheric measurements had come into use—the telegraph allowed for fast compilations of atmospheric conditions in time series from many different locations. This practice became standard in many parts of the world by the 1870s and led to the creation of the first meaningful data sets for predicting local weather based on what was happening in other geographic locations.
By the turn of the 20th century, the idea of forecasting the weather with computational methods was vigorously pursued with the help of these compiled data sets. Early endeavors at computing the weather required a spectacular amount of effort but gave poor results. While physicists and chemists had wellproven ideas about the relevant natural laws, there were too many natural laws to apply all at once. The resulting system of equations was so complex that it was a notable scientific breakthrough the first time someone even attempted to do the calculations.
Several decades of research followed to simplify the physical equations in a way that increased accuracy and computational efficiency. These tricks of the trade have been handed down even to current weather prediction models, which operate on a mix of known physical principles and proven heuristics.
Nowadays many governments make highly granular weather measurements from hundreds or even thousands of weather stations around the world, and these predictions are grounded in data with precise information about weather station locations and equipment. The roots of these efforts trace back to the coordinated data sets of the 1870s and even earlier to the Renaissance practice of keeping local weather diaries.
Unfortunately, weather forecasting is an example of the increasing attacks on science that reach even into the domain of time series forecasting. Not only have time series debates about global temperatures been politicized, but so have more mundane time series forecasting tasks, such as predicting the path of a hurricane.
Forecasting Economic Growth
Indicators of production and efficiency in markets have long provided interesting data to study from a time series analysis. Most interesting and urgent has been the question of forecasting future economic states based on the past. Such forecasts aren’t merely useful for making money—they also help promote prosperity and avert social catastrophes. Let’s discuss some important developments in the history of economic forecasting.
Economic forecasting grew out of the anxiety triggered by episodic banking crises in the United States and Europe in the late 19th and early 20th centuries. At that time, entrepreneurs and researchers alike drew inspiration from the idea that the economy could be likened to a cyclical system, just as the weather was thought to behave. With the right measurements, it was thought, predictions could be made and crashes averted.
Even the language of early economic forecasting mirrored the language of weather forecasting. This was unintentionally apt. In the early 20th century, economic and weather forecasting were indeed alike: both were pretty terrible. But economists’ aspirations created an environment in which progress could at least be hoped for, and so a variety of public and private institutions were formed for tracking economic data. Early economic forecasting efforts led to the creation of economic indicators and tabulated, publicly available histories of those indicators that are still in use today. We will even use some of these in this book.
Nowadays, the United States and most other nations have thousands of government researchers and recordkeepers whose jobs are to record data as accurately as possible and make it available to the public (see Figure 14). This practice has proven invaluable to economic growth and the avoidance of economic catastrophe and painful boom and bust cycles. What’s more, businesses benefit from a datarich atmosphere, as these public data sets permit transportation providers, manufacturers, small business owners, and even farmers to anticipate likely future market conditions. This all grew out of the attempt to identify “business cycles” that were thought to be the causes of cyclical banking failures, an early form of time series analysis in economics.
Much of the economic data collected by the government, particularly the most newsworthy, tends to be a proxy for the population’s overall economic wellbeing. One example of such vital information comes from the number of people requesting unemployment benefits. Examples include the government’s estimates of the gross domestic product and of the total tax returns received in a given year.
Thanks to this desire for economic forecasting, the government has become a curator of data as well as a collector of taxes. The collection of this data enabled modern economics, the modern finance industry, and data science generally to blossom. Thanks to time series analysis growing out of economic questions, we now safely avert many more banking and financial crises than any government could have in past centuries. Also, hundreds of time series textbooks have been written in the form of economics textbooks devoted to understanding the rhythms of these financial indicators.
Trading markets
Let’s get back to the historical side of things. As government efforts at data collection met with great success, private organizations began to copy government recordkeeping. Over time, commodities and stock exchanges became increasingly technical. Financial almanacs became popular, too. This happened both because market participants became more sophisticated and because emerging technologies enabled greater automation and new ways of competing and thinking about prices.
All this minute recordkeeping gave rise to the pursuit of making money off the markets via math rather than intuition, in a way driven entirely by statistics (and, more recently, by machine learning). Early pioneers did this mathematical work by hand, whereas current “quants” do this by very complicated and proprietary time series analytic methods.
One of the pioneers of mechanical trading, or time series forecasting via algorithm, was Richard Dennis. Dennis was a selfmade millionaire who famously turned ordinary people, called the Turtles, into star traders by teaching them a few select rules about how and when to trade. These rules were developed in the 1970s and 1980s and mirrored the “AI” thinking of the 1980s, in which heuristics still strongly ruled the paradigm of how to build intelligent machines to work in the real world.
Since then many “mechanical” traders have adapted these rules, which as a result have become less profitable in a crowded automated market. Mechanical traders continue to grow in number and wealth, they are continually in search of the next best thing because there is so much competition.
Astronomy
Astronomy has always relied heavily on plotting objects, trajectories, and measurements over time. For this reason, astronomers are masters of time series, both for calibrating instruments and for studying their objects of interest. As an example of the long history of time series data, consider that sunspot time series were recorded in ancient China as early as 800 BC, making sunspot data collection one of the most wellrecorded natural phenomena ever.
Some of the most exciting astronomy of the past century relates to time series analysis. The discovery of variable stars (which can be used to deduce galactic distances) and the observation of transitory events such as supernovae (which enhance our understanding of how the universe changes over time) are the result of monitoring live streams of time series data based on the wavelengths and intensities of light. Time series have had a fundamental impact on what we can know and measure about the universe.
Incidentally, this monitoring of astronomical images has even allowed astronomers to catch events as they are happening (or rather as we are able to observe them, which may take millions of years).
In the last few decades, the availability of explicitly timestamped data, as formal time series, has exploded in astronomy with a wide array of new kinds of telescopes collecting all sorts of celestial data. Some astronomers have even referred to a time series “data deluge.”
Time Series Analysis Takes Off
George Box, a pioneering statistician who helped develop a popular time series model, was a great pragmatist. He famously said, “All models are wrong, but some are useful.”
Box made this statement in response to a common attitude that proper time series modeling was a matter of finding the best model to fit the data. As he explained, the idea that any model can describe the real world is very unlikely. Box made this pronouncement in 1978, which seems bizarrely late into the history of a field as important as time series analysis, but in fact the formal discipline was surprisingly young.
For example, one of the achievements that made George Box famous, the BoxJenkins method—considered a fundamental contribution to time series analysis—appeared only in 1970.^{4} Interestingly, this method first appeared not in an academic journal but rather in a statistics textbook, Time Series Analysis: Forecasting and Control (Wiley). Incidentally this textbook remains popular and is now in its fifth edition.
The original BoxJenkins model was applied to a data set of carbon dioxide levels emitted from a gas furnace. While there is nothing quaint about a gas furnace, the 300point data set that was used to demonstrate the method does feel somewhat outmoded. Certainly, larger data sets were available in the 1970s, but remember that they were exceptionally difficult to work with then. This was a time that predated conveniences such as R, Python, and even C++. Researchers had good reasons to focus on small data sets and methods that minimized computing resources.
Time series analysis and forecasting developed as computers did, with larger data sets and easier coding tools paving the way for more experimentation and the ability to answer more interesting questions. Professor Rob Hyndman’s history of forecasting competitions provides apt examples of how time series forecasting competitions developed at a rate parallel to that of computers.
Professor Hyndman places the “earliest nontrivial study of time series forecast accuracy” as occurring in a 1969 doctoral dissertation at the University of Nottingham, just a year before the publication of the BoxJenkins method. That first effort was soon followed by organized time series forecasting competitions, the earliest ones featuring around 100 data sets in the early 1970s.^{5} Not bad, but surely something that could be done by hand if absolutely necessary.
By the end of the 1970s, researchers had put together a competition with around 1,000 data sets, an impressive scaling up. Incidentally, this era was also marked by the first commercial microprocessor, the development of floppy disks, Apple’s early personal computers, and the computer language Pascal. It’s likely some of these innovations were helpful. A time series forecasting competition of the late 1990s included 3,000 data sets. While these collections of data sets were substantial and no doubt reflected tremendous amounts of work and ingenuity to collect and curate, they are dwarfed by the amount of data now available. Time series data is everywhere, and soon everything will be a time series.
This rapid growth in the size and quality of data sets owes its origins to the tremendous advances that have been made in computing in the past few decades. Hardware engineers succeeded in continuing the trend described by Moore’s Law—a prediction of exponential growth in computing capacity—during this time. As hardware became smaller, more powerful, and more efficient, it was easy to have much more of it, affordably—to create everything from miniature portable computers with attached sensors to massive data centers powering the modern internet in its datahungry form. Most recently, wearables, machine learning techniques, and GPUs have revolutionized the quantity and quality of data available for study.^{6}
Time series will no doubt benefit as computing power increases because many aspects of time series data are computationally demanding. With rampedup computational and data resources, time series analysis can be expected to continue its rapid pace of development.
The Origins of Statistical Time Series Analysis
Statistics is a very young science. Progress in statistics, data analysis, and time series has always depended strongly on when, where, and how data was available and in what quantity. The emergence of time series analysis as a discipline is linked not only to developments in probability theory but equally to the development of stable nation states, where recordkeeping first became a realizable and interesting goal. We covered this earlier with respect to a variety of disciplines. Now we’ll think about time series itself as a discipline.
One benchmark for the beginning of time series analysis as a discipline is the application of autoregressive models to real data. This didn’t happen until the 1920s. Udny Yule, an experimental physicist turned statistical lecturer at Cambridge University, applied an autoregressive model to sunspot data, offering a novel way to think about the data in contrast to methods designed to fit the frequency of an oscillation. Yule pointed out that an autoregressive model did not begin with a model that assumed periodicity:
When periodogram analysis is applied to data respecting any physical phenomenon in the expectation of eliciting one or more true periodicities, there is usually, as it seems to me, a tendency to start from the initial hypothesis that the periodicity or periodicities are masked solely by such more or less random superposed fluctuations—fluctuations which do not in any way disturb the steady course of the underlying periodic function or functions…there seems no reason for assuming it to be the hypothesis most likely a priori.
Yule’s thinking was his own, but it’s likely that some historical influences led him to notice that the traditional model presupposed its own outcome. As a former experimental physicist who had worked abroad in Germany (the epicenter for the burgeoning theory of quantum mechanics), Yule would certainly have been aware of the recent developments that highlighted the probabilistic nature of quantum mechanics. He also would have recognized the dangers of narrowing one’s thinking to a model that presupposes too much, as classical physicists had done before the discovery of quantum mechanics.
As the world became a more orderly, recorded, and predictable place, particularly after World War II, early problems in practical time series analysis were presented by the business sector. Businessoriented time series problems were important and not overly theoretical in their origins. These included forecasting demand, estimating future raw materials prices, and hedging on manufacturing costs. In these industrial use cases, techniques were adopted when they worked and rejected when they didn’t. It probably helped that industrial workers had access to larger data sets than were available to academics at the time (as continues to be the case now). This meant that sometimes practical but theoretically underexplored techniques came into widespread use before they were well understood.
The Origins of Machine Learning Time Series Analysis
Early machine learning in time series analysis dates back many decades. An oftcited paper from 1969, “The Combination of Forecasts,” analyzed the idea of combining forecasts rather than choosing a “best one” as a way to improve forecast performance. This idea was, at first, abhorrent to traditional statisticians, but ensemble methods have come to be the gold standard in many forecasting problems. Ensembling rejects the idea of a perfect or even significantly superior forecasting model relative to all possible models.
More recently, practical uses for time series analysis and machine learning emerged as early as the 1980s, and included a wide variety of scenarios:

Computer security specialists proposed anomaly detection as a method of identifying hackers/intrusions.

Dynamic time warping, one of the dominant methods for “measuring” the similarity of time series, came into use because the computing power would finally allow reasonably fast computation of “distances,” say between different audio recordings.

Recursive neural networks were invented and shown to be useful for extracting patterns from corrupted data.
Time series analysis and forecasting have yet to reach their golden period, and, to date, time series analysis remains dominated by traditional statistical methods as well as simpler machine learning techniques, such as ensembles of trees and linear fits. We are still waiting for a great leap forward for predicting the future.
More Resources

On the history of time series analysis and forecasting:
 Kenneth F. Wallis, “Revisiting Francis Galton’s Forecasting Competition,” Statistical Science 29, no. 3 (2014): 420–24, https://perma.cc/FJ6V8HUY.
This is a historical and statistical discussion of a very early paper on forecasting the weight of a butchered ox while the animal was still alive at a county fair.
 G. Udny Yule, “On a Method of Investigating Periodicities in Disturbed Series, with Special Reference to Wolfer’s Sunspot Numbers,” Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 226 (1927): 267–98, https://perma.cc/D6SL7UZS.
Udny Yule’s seminal paper, one of the first applications of autoregressive moving average analysis to real data, illustrates a way to remove the assumption of periodicity from analysis of a putatively periodic phenomenon.
 J.M. Bates and C. W. J. Granger, “The Combination of Forecasts,” Organizational Research Quarterly 20, No. 4 (1969): 451–68, https://perma.cc/9AEEQZ2J.
This seminal paper describes the use of ensembling for time series forecasting. The idea that averaging models was better for forecasting than looking for a perfect model was both new and controversial to many traditional statisticians.
 Jan De Gooijer and Rob Hyndman, “25 Years of Time Series Forecasting,” International Journal of Forecasting 22, no. 3 (2006): 443–73, https://perma.cc/84RG58BU.
This is a thorough statistical summary of time series forecasting research in the 20th century.
 Rob Hyndman, “A Brief History of Time Series Forecasting Competitions,” Hyndsight blog, April 11, 2018, https://perma.cc/32LJRFJW.
This shorter and more specific history gives specific numbers, locations, and authors of prominent time series forecasting competitions in the last 50 years.

On domainspecific time series histories and commentary:
 NASA, “Weather Forecasting Through the Ages,” Nasa.gov, February 22, 2002, https://perma.cc/8GK5JAVT.
NASA gives a history of how weather forecasting came to be, with emphasis on specific research challenges and successes in the 20th century.
 Richard C. Cornes, “Early Meteorological Data from London and Paris: Extending the North Atlantic Oscillation Series,” PhD diss., School of Environmental Sciences, University of East Anglia, Norwich, UK, May 2010, https://perma.cc/NJ33WVXH.
This doctoral thesis offers a fascinating account of the kinds of weather information available for two of Europe’s most important cities, complete with extensive listings of the locations and nature of historic weather in time series format.
 Dan Mayer, “A Brief History of Medicine and Statistics,” in Essential EvidenceBased Medicine (Cambridge, UK: Cambridge University Press, 2004), https://perma.cc/WKU39SUX.
This chapter of Mayer’s book highlights how the relationship between medicine and statistics depended greatly on social and political factors that made data and statistical training available for medical practitioners.
 Simon Vaughan, “Random Time Series in Astronomy”, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, no. 1984 (2013): 1–28, https://perma.cc/J3VS6JYB.
Vaughan summarizes the many ways time series analysis is relevant to astronomy and warns about the danger of astronomers rediscovering time series principles or missing out on extremely promising collaborations with statisticians.
^{1} Examples include the British Doctors Study and the Nurses’ Health Study.
^{2} See, for example, Darrell Etherington, Amazon, JPMorgan and Berkshire Hathaway to Build Their Own Healthcare Company,” TechCrunch, January 30, 2018, https://perma.cc/S789EQGW; Christina Farr, Facebook Sent a Doctor on a Secret Mission to Ask Hospitals to Share Patient Data,” CNBC, April 5, 2018, https://perma.cc/65GFM2SJ.
^{3} This same Robert FitzRoy was captain of the HMS Beagle during the voyage that took Charles Darwin around the world. This voyage was instrumental in providing evidence to Darwin for the theory of evolution by natural selection.
^{4} The BoxJenkins method has become a canonical technique for choosing the best parameters for an ARMA or ARIMA model to model a time series. More on this in Chapter 6.
^{5} That is, 100 separate data sets in different domains of various time series of different lengths.
^{6} Given the array of gadgets humans carry around with them as well as the timestamps they create as they shop for groceries, log in to a computer portal at work, browse the internet, check a health indicator, make a phone call, or navigate traffic with GPS, we can safely say that an average American likely produces thousands of time series data points every year of their life.
Get Practical Time Series Analysis now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.