Chapter 5. Ingest and Organize Data Lake

In this chapter, we will look at how to ingest and organize data to the newly created Data Lake to make it effective and useful. The topics covered in this chapter are as follows:

  • End-to-end Data Lake solution
  • Ingest data using HDFS commands
  • Ingest data to Azure Blob using Azure PowerShell
  • Ingest data using CloudXplorer
  • Using Sqoop to move data from RDBMS to cluster
  • Organizing your data in HDFS
  • Managing metadata using HCatalog

End-to-end Data Lake solution

In the next few chapters, we will build an end-to-end Data Lake solution using HDInsight. As discussed in Chapter 2, Enterprise Data Lake using HDInsight, the three key components required for a Data Lake are:

  • Ingest and organize
  • Transform
  • Access, analyze, and report ...

Get HDInsight Essentials - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.