Spark with R on a multi-node HDInsight cluster
Although Spark can be deployed in single-node, standalone mode, its powerful capabilities are best fit for multi-node applications. With this in mind, we will dedicate most of this chapter to practical Big Data crunching with Spark and R on a Microsoft Azure HDInsight cluster. As you should already be familiar with the deployment process of HDInsight clusters, our Spark workflows will contain one additional twist—the Spark framework will process the data straight from the Hive database, which will be populated with tables from HDFS. The introduction of Hive is a useful extension of the concepts covered in Chapter 5, R with Relational Database Management Systems (RDBMSs) and Chapter 6, R with Non-Relational ...
Get Big Data Analytics with R now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.