Chapter 6. Moving Average

The purpose of this chapter is to present a moving average solution in MapReduce/Hadoop. Before presenting a MapReduce solution, we will look at the basic concepts of a moving average. First, though, we need to understand time series data. Time series data represents the values of a variable over a period of time, such as a second, minute, hour, day, week, month, quarter, or year. Semiformally, we can represent time series data as a sequence of triplets:

  • (k, t, v)

where k is a key (such as a stock symbol), t is a time (in hours, minutes, or seconds), and v is the associated value (such as, value of a stock at point t). Typically, time series data occurs whenever the same measurements are recorded over a period of time. For example, the closing price of a company stock is time series data over minutes, hours, or days. The mean (or average) of time series data (observations equally spaced in time, such as per hour or per day) from several consecutive periods is called the moving average. It is called moving because the average is continually recomputed as new time series data becomes available, and it progresses by dropping the earliest value and adding the most recent.

Example 1: Time Series Data (Stock Prices)

Consider the data shown in Table 6-1 for the closing stock price of a company called MY-STOCK (note that this is a fake stock symbol).

Table 6-1. Time series data for MY-STOCK closing price
Time series Date Closing price
1 2013-10-01

Get Data Algorithms now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.