The purpose of this chapter is to present a moving average solution in MapReduce/Hadoop. Before presenting a MapReduce solution, we will look at the basic concepts of a moving average. First, though, we need to understand time series data. Time series data represents the values of a variable over a period of time, such as a second, minute, hour, day, week, month, quarter, or year. Semiformally, we can represent time series data as a sequence of triplets:
k is a key (such as a stock symbol),
t is a time (in hours, minutes, or seconds), and
v is the associated value (such as, value of a stock at point
t). Typically, time series data occurs whenever the same measurements are recorded over a period of time. For example, the closing price of a company stock is time series data over minutes, hours, or days. The mean (or average) of time series data (observations equally spaced in time, such as per hour or per day) from several consecutive periods is called the moving average. It is called moving because the average is continually recomputed as new time series data becomes available, and it progresses by dropping the earliest value and adding the most recent.
Consider the data shown in Table 6-1 for the closing stock price of a company called MY-STOCK (note that this is a fake stock symbol).
|Time series||Date||Closing price|