May 2017
Beginner to intermediate
596 pages
15h 2m
English
Exposing data using RESTful endpoints (web service) is quite useful and functional for many use cases, but there are use cases that requires data from the lake in a more scheduled manner, and that too, quite huge amount of data. In that case you could even bring in Apache Sqoop as a way by that to expose or transfer data from the lake to other consuming application’s data store. Other mechanisms may include scheduling jobs to extract transform and load data out of Hadoop using Map-Reduce jobs that are scheduled to be triggered and push the data to ftp locations using scripts or ETL tools that support HDFS integration like Talend and Pentaho’s Data Integration.
Such exports are best to be done from batch storage since batch storage ...