Chapter 5: Data Transformation and Processing with Synapse Notebooks

In this chapter, we will cover how to do data processing and transformation with Synapse notebooks. Details on using pandas DataFrames within Synapse notebooks will be covered, which will help us to explore data that is stored as Parquet files in Azure Data Lake Storage (ADLS) Gen2 as a pandas DataFrame and then write it back to ADLS Gen2 as a Parquet file.

We will be covering the following recipes:

  • Landing data in ADLS Gen2
  • Exploring data with ADLS Gen2 to pandas DataFrame in Synapse notebook
  • Processing data from a PySpark notebook within Synapse
  • Performing read-write operations to a Parquet file using Spark in Synapse
  • Analytics with Spark

Landing data in ADLS Gen2

In ...

Get Azure Synapse Analytics Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.