Azure Synapse Analytics Cookbook

Book description

Whether you’re an Azure veteran or just getting started, get the most out of your data with effective recipes for Azure Synapse

Key Features

  • Discover new techniques for using Azure Synapse, regardless of your level of expertise
  • Integrate Azure Synapse with other data sources to create a unified experience for your analytical needs using Microsoft Azure
  • Learn how to embed data governance and classification with Synapse Analytics by integrating Azure Purview

Book Description

As data warehouse management becomes increasingly integral to successful organizations, choosing and running the right solution is more important than ever. Microsoft Azure Synapse is an enterprise-grade, cloud-based data warehousing platform, and this book holds the key to using Synapse to its full potential. If you want the skills and confidence to create a robust enterprise analytical platform, this cookbook is a great place to start.

You'll learn and execute enterprise-level deployments on medium-to-large data platforms. Using the step-by-step recipes and accompanying theory covered in this book, you'll understand how to integrate various services with Synapse to make it a robust solution for all your data needs. Whether you're new to Azure Synapse or just getting started, you'll find the instructions you need to solve any problem you may face, including using Azure services for data visualization as well as for artificial intelligence (AI) and machine learning (ML) solutions.

By the end of this Azure book, you'll have the skills you need to implement an enterprise-grade analytical platform, enabling your organization to explore and manage heterogeneous data workloads and employ various data integration services to solve real-time industry problems.

What you will learn

  • Discover the optimal approach for loading and managing data
  • Work with notebooks for various tasks, including ML
  • Run real-time analytics using Azure Synapse Link for Cosmos DB
  • Perform exploratory data analytics using Apache Spark
  • Read and write DataFrames into Parquet files using PySpark
  • Create reports on various metrics for monitoring key KPIs
  • Combine Power BI and Serverless for distributed analysis
  • Enhance your Synapse analysis with data visualizations

Who this book is for

This book is for data architects, data engineers, and developers who want to learn and understand the main concepts of Azure Synapse analytics and implement them in real-world scenarios.

Table of contents

  1. Azure Synapse Analytics Cookbook
  2. Foreword
  3. Contributors
  4. About the authors
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Chapter 1: Choosing the Optimal Method for Loading Data to Synapse
    1. Choosing a data loading option
      1. Getting ready
      2. How to do it…
      3. How it works…
      4. There's more…
    2. Achieving parallelism in data loading using PolyBase
    3. Moving and transforming using a data flow
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. Adding a trigger to a data flow pipeline
      1. Getting ready
      2. How to do it…
      3. How it works…
    5. Unsupported data loading scenarios
      1. How to do it…
      2. There's more…
    6. Data loading best practices
      1. How to do it…
  7. Chapter 2: Creating Robust Data Pipelines and Data Transformation
    1. Reading and writing data from ADLS Gen2 using PySpark
      1. Getting ready
      2. How to do it…
      3. How it works…
    2. Visualizing data in a Synapse notebook
      1. Getting ready
      2. How to do it…
      3. How it works…
  8. Chapter 3: Processing Data Optimally across Multiple Nodes
    1. Working with the resource consumption model of Synapse SQL
      1. Architecture components of Synapse SQL
      2. Resource consumption
    2. Optimizing analytics with dedicated SQL pool and working on data distribution
      1. Understanding columnstore storage details
      2. Knowing when to use round-robin, hash-distributed, and replicated distributions
      3. Knowing when to partition a table
      4. Checking for skewed data and space usage
      5. Best practices
      6. Workload management for dedicated SQL pool
    3. Working with serverless SQL pool
      1. Getting ready
      2. How to do it…
      3. There's more…
    4. Processing and querying very large datasets
      1. Getting ready
      2. How to do it…
    5. Script for statistics in Synapse SQL
      1. How to do it…
      2. There's more…
  9. Chapter 4: Engineering Real-Time Analytics with Azure Synapse Link Using Cosmos DB
    1. Integrating an Azure Synapse ETL pipeline with Cosmos DB
      1. Introducing Cosmos DB
      2. Azure Synapse Link integration
      3. Supported features of Azure Synapse Link
      4. Azure Synapse runtime support
      5. Structured streaming support
      6. Network and data security support for Azure Synapse Link with Cosmos DB
    2. Setting up Azure Cosmos DB analytical store
      1. Getting ready
      2. How to do it…
    3. Enabling Azure Synapse Link and connecting Azure Cosmos DB to Azure Synapse
      1. Getting ready
      2. How to do it…
    4. IoT end-to-end solutions and getting real-time insights
      1. Getting ready
      2. How to do it…
    5. Use cases using Synapse Link
  10. Chapter 5: Data Transformation and Processing with Synapse Notebooks
    1. Landing data in ADLS Gen2
      1. Getting ready
      2. How to do it…
    2. Exploring data with ADLS Gen2 to pandas DataFrame in Synapse notebook
      1. Getting ready
      2. How to do it…
      3. There's more…
    3. Processing data from a PySpark notebook within Synapse
      1. How to do it…
    4. Performing read-write operations to a Parquet file using Spark in Synapse
      1. Getting ready
      2. How to do it…
    5. Analytics with Spark
      1. Getting ready
      2. How it works…
  11. Chapter 6: Enriching Data Using the Azure ML AutoML Regression Model
    1. Training a model using AutoML in Synapse
      1. Getting ready
      2. How to do it…
      3. How it works…
    2. Building a regression model from Azure Machine Learning in Synapse Studio
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Modeling and scoring using SQL pools
      1. Getting ready
      2. How to do it…
      3. How it works…
    4. An overview of Spark MLlib and Azure Synapse
    5. Integrating AI and Cognitive Services
      1. Getting ready
      2. How to do it…
      3. How it works…
  12. Chapter 7: Visualizing and Reporting Petabytes of Data
    1. Combining Power BI and aserverless SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    2. Working on a composite model
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Using materialized views to improve performance
      1. Getting ready
      2. How to do it…
      3. How it works…
  13. Chapter 8: Data Cataloging and Governance
    1. Configuring your Azure Purview account for Synapse SQL pool
      1. Getting ready
      2. How to do it…
      3. How it works…
    2. Scanning data using the Purview data catalog
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Enumerating resources within Synapse Studio
      1. Getting ready
      2. How to do it…
      3. How it works…
  14. Chapter 9: MPP Platform Migration to Synapse
    1. Understanding data migration challenges
      1. Tables and databases
      2. Data modeling
      3. Data Manipulation Language statements
      4. Functions, stored procedures, sequences, and triggers
    2. Configuring Azure Synapse Pathway
      1. Getting ready
      2. How to do it…
      3. How it works…
    3. Evaluating a data source to be migrated
      1. Getting ready
      2. How to do it…
    4. Generating a data migration assessment
      1. Getting ready
      2. How to do it…
    5. Supported data sources for migration
      1. IBM Netezza and Azure Synapse platform differences
      2. Oracle Exadata and Azure Synapse platform differences
      3. Snowflake and Azure Synapse platform differences
      4. Microsoft SQL Server and Azure Synapse platform differences
    6. Why subscribe?
  15. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Azure Synapse Analytics Cookbook
  • Author(s): Gaurav Agarwal, Meenakshi Muralidharan
  • Release date: April 2022
  • Publisher(s): Packt Publishing
  • ISBN: 9781803231501