Chapter 7: Using Databricks Spark Clusters
In the last chapter, Chapter 6, Using Synapse Spark Pools, you learned about Spark and the Synapse integrated Spark engine. But what about cases where you only need a Spark cluster to interact with your Data Lake Store? You would, for example, choose Databricks over Synapse Spark pools at this point in time, when you need to work on Spark 3.0 or when you need to implement Structured Streaming. R, as a required programming language, will require Databricks as well as the Databricks-specific features of Delta Lake, such as vacuuming and others. Synapse will offer most of these options, too, in the future. But at the moment, they are available only in Databricks.
With Azure Databricks, Microsoft offers ...
Get Cloud Scale Analytics with Azure Data Services now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.