Chapter 6: Using Synapse Spark Pools

In your modern data warehouse project, you may use Azure Data Factory ETL pipelines (see Chapter 5, Integrating Data into Your Modern Data Warehouse) to integrate and transform incoming data according to your needs. However, chances are that you are a more code-oriented developer, that you are already very proficient with Spark, or that your transformational needs reach beyond the functionality or the available compute power of Data Factory.

Maybe you need to train and implement machine learning models as part of your project, and you want a Spark engine that can scale to your needs and offers suitable libraries and tight integration with all the other tools that you plan to use on Azure.

This chapter will ...

Get Cloud Scale Analytics with Azure Data Services now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.