Skip to Content
Data Lake for Enterprises
book

Data Lake for Enterprises

by Vivek Mishra, Tomcy John, Pankaj Misra
May 2017
Beginner to intermediate
596 pages
15h 2m
English
Packt Publishing
Content preview from Data Lake for Enterprises

Data indexing from Hive

Now that we can visualize all the data loaded into Hadoop via Hive tables, we have complete customer data in Hadoop. The address and contacts data is there in both Elasticsearch and Hadoop, using Flink pipeline. Also, customer profile data is available in Hadoop, using Sqoop job. But, we don't have customer profile data in Elasticsearch.

For this, we can export the Hive data as Elasticsearch indices. This can be achieved by using ES-Hadoop framework, which is part of Elastic Stack.

For ES-Hadoop framework to work with Hive, a quick setup and configuration are required, as summarized here:

  1. Download the ES-Hadoop binaries from the following location using the following command 
wget http://download.elastic.co/hadoop/elasticsearch-hadoop-5.4.0.zip ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The Enterprise Big Data Lake

The Enterprise Big Data Lake

Alex Gorelik
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Lakes

Data Lakes

Anne Laurent, Dominique Laurent, Cédrine Madera

Publisher Resources

ISBN: 9781787281349Supplemental Content