O'Reilly logo

HDInsight Essentials - Second Edition by Rajesh Nadipalli

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 5. Ingest and Organize Data Lake

In this chapter, we will look at how to ingest and organize data to the newly created Data Lake to make it effective and useful. The topics covered in this chapter are as follows:

  • End-to-end Data Lake solution
  • Ingest data using HDFS commands
  • Ingest data to Azure Blob using Azure PowerShell
  • Ingest data using CloudXplorer
  • Using Sqoop to move data from RDBMS to cluster
  • Organizing your data in HDFS
  • Managing metadata using HCatalog

End-to-end Data Lake solution

In the next few chapters, we will build an end-to-end Data Lake solution using HDInsight. As discussed in Chapter 2, Enterprise Data Lake using HDInsight, the three key components required for a Data Lake are:

  • Ingest and organize
  • Transform
  • Access, analyze, and report ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required