User-defined aggregation functions (Advanced)

In this recipe, we will implement a generic user-defined aggregation function that computes a simple linear regression. Standard functions and table generating functions both operate in the map phase. A reducer can run in both the map and the reduce phrases; this adds some complexity to the implementation.

Getting ready

We will use a synthetic dataset as input. Create a file named points.tsv in the data subdirectory with the following contents:

0.91    16.19
10.86   246.40
21.00   475.88
30.68   705.55
41.30   936.11
50.65   1166.10
61.31   1396.09
70.56   1626.13
80.97   1856.33
90.56   2086.21

The two numbers in each line should be separated by a single tab character.

How to do it...

The following steps will give you a better ...

Get Instant Apache Hive Essentials How-to now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.