User-defined aggregation functions (Advanced)
In this recipe, we will implement a generic user-defined aggregation function that computes a simple linear regression. Standard functions and table generating functions both operate in the map phase. A reducer can run in both the map and the reduce phrases; this adds some complexity to the implementation.
We will use a synthetic dataset as input. Create a file named
points.tsv in the data subdirectory with the following contents:
0.91 16.19 10.86 246.40 21.00 475.88 30.68 705.55 41.30 936.11 50.65 1166.10 61.31 1396.09 70.56 1626.13 80.97 1856.33 90.56 2086.21
The two numbers in each line should be separated by a single tab character.
How to do it...
The following steps will give you a better ...