May 2017
Beginner to intermediate
596 pages
15h 2m
English
We will be using the same set of data as used before, that is, 2 million customer records, addresses, and contacts.
But before we proceed, let's clean the data created in previous chapters by following the steps explained here. Ensure the required processes are up and running for the cleanup, i.e. Hue, DFS, hiveserver2, Zookeeper and Kafka.