Map Reduce program to find distinct values
In this recipe, we are going to learn how to write a map reduce program to find distinct values from a given set of data.
Getting ready
To perform this recipe, you should have a running Hadoop cluster as well as an eclipse that is similar to an IDE.
How to do it
Sometimes, there may be a chance that the data you have contains some duplicate values. In SQL, we have something called a distinct function, which helps us get distinct values. In this recipe, we are going to take a look at how we can get distinct values using map reduce programs.
Let's consider a use case where we have some user data with us, which contains two columns: userId
and username
. Let's assume that the data we have contains duplicate records, ...
Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.