© Benjamin Weissman and Enrico van de Laar 2020
B. Weissman, E. van de LaarSQL Server Big Data Clusters https://doi.org/10.1007/978-1-4842-5985-6_6

6. Working with Spark in Big Data Clusters

Benjamin Weissman1  and Enrico van de Laar2
(1)
Nurnberg, Germany
(2)
Drachten, The Netherlands
 

So far, we have been querying data inside our SQL Server Big Data Cluster using external tables and T-SQL code. We do, however, have another method available to query data that is stored inside the HDFS filesystem of your Big Data Cluster. As you have read in Chapter 2, Big Data Clusters also have Spark included in the architecture, meaning we can leverage the power of Spark to query data stored inside our Big Data Cluster.

Spark is a very powerful option of analyzing ...

Get SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.