Skip to Content
Data Lake for Enterprises
book

Data Lake for Enterprises

by Vivek Mishra, Tomcy John, Pankaj Misra
May 2017
Beginner to intermediate
596 pages
15h 2m
English
Packt Publishing
Content preview from Data Lake for Enterprises

Data exports

Exposing data using RESTful endpoints (web service) is quite useful and functional for many use cases, but there are use cases that requires data from the lake in a more scheduled manner, and that too, quite huge amount of data. In that case you could even bring in Apache Sqoop as a way by that to expose or transfer data from the lake to other consuming application’s data store. Other mechanisms may include scheduling jobs to extract transform and load data out of Hadoop using Map-Reduce jobs that are scheduled to be triggered and push the data to ftp locations using scripts or ETL tools that support HDFS integration like Talend and Pentaho’s Data Integration.

Such exports are best to be done from batch storage since batch storage ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

The Enterprise Big Data Lake

The Enterprise Big Data Lake

Alex Gorelik
Operationalizing the Data Lake

Operationalizing the Data Lake

Holden Ackerman, Jon King
Data Lakes

Data Lakes

Anne Laurent, Dominique Laurent, Cédrine Madera

Publisher Resources

ISBN: 9781787281349Supplemental Content