4. Getting Data into Hadoop
You can have data without information, but you cannot have information without data.
Daniel Keys Moran
In This Chapter:
The data lake concept is presented as a new data processing paradigm.
Basic methods for importing CSV data into HDFS and Hive tables are presented.
Additional methods for using Spark to import data into Hive tables or directly for a Spark job are presented.
Apache Sqoop is introduced as a tool for ...
Get Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.