B. Weissman, E. van de LaarSQL Server Big Data Clusters https://doi.org/10.1007/978-1-4842-5985-6_6

6. Working with Spark in Big Data Clusters

Benjamin Weissman¹ and Enrico van de Laar²

(1)

Nurnberg, Germany

(2)

Drachten, The Netherlands

So far, we have been querying data inside our SQL Server Big Data Cluster using external tables and T-SQL code. We do, however, have another method available to query data that is stored inside the HDFS filesystem of your Big Data Cluster. As you have read in Chapter 2, Big Data Clusters also have Spark included in the architecture, meaning we can leverage the power of Spark to query data stored inside our Big Data Cluster.

Spark is a very powerful option of analyzing ...

Get SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform by Benjamin Weissman, Enrico van de Laar

6. Working with Spark in Big Data Clusters

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly