Chapter 1 described Hadoop as an open source distributed file system that allows processing of vast amounts of data. Microsoft’s implementation of Hadoop on its Azure cloud platform is called HDInsight, which is designed to handle large amounts of data in the cloud. HDInsight uses Azure blob storage for storing the data, to make Hadoop available as a service in the cloud. Azure is useful for storing large datasets in the cloud in a cost-effective manner. Microsoft only charges for resources actually used.
Apache Hive supports analysis of large datasets stored in Hadoop. Hive is a data warehouse infrastructure built on top ...