Azure Blob storage is the default and preferred way to store data in HDInsight. HDInsight supports the Hadoop distributed file system (HDFS) as well as Azure Blob storage for storing data. This chapter covers uploading data to Blob storage and executing MapReduce jobs on it. It starts with different command-line utilities to upload data and looks at a couple of graphical clients. You’ll create your first MapReduce job and execute it using PowerShell. Also, you’ll look at .NET SDK to create and execute job on HDInsight. And finally, you’ll learn about Avro serialization. ...
© Vinit Yadav 2017
Vinit Yadav, Processing Big Data with Azure HDInsight, 10.1007/978-1-4842-2869-2_3
3. Working with Data in HDInsight
(1)Ahmedabad, Gujarat, India