4. Getting Data into Hadoop

You can have data without information, but you cannot have information without data.

Daniel Keys Moran

In This Chapter:

Images The data lake concept is presented as a new data processing paradigm.

Images Basic methods for importing CSV data into HDFS and Hive tables are presented.

Images Additional methods for using Spark to import data into Hive tables or directly for a Spark job are presented.

Apache Sqoop is introduced as a tool for ...

Get Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.