Azure Data Scientist Associate Certification Guide

Book description

Develop the skills you need to run machine learning workloads in Azure and pass the DP-100 exam with ease

Key Features

  • Create end-to-end machine learning training pipelines, with or without code
  • Track experiment progress using the cloud-based MLflow-compatible process of Azure ML services
  • Operationalize your machine learning models by creating batch and real-time endpoints

Book Description

The Azure Data Scientist Associate Certification Guide helps you acquire practical knowledge for machine learning experimentation on Azure. It covers everything you need to pass the DP-100 exam and become a certified Azure Data Scientist Associate.

Starting with an introduction to data science, you'll learn the terminology that will be used throughout the book and then move on to the Azure Machine Learning (Azure ML) workspace. You'll discover the studio interface and manage various components, such as data stores and compute clusters.

Next, the book focuses on no-code and low-code experimentation, and shows you how to use the Automated ML wizard to locate and deploy optimal models for your dataset. You'll also learn how to run end-to-end data science experiments using the designer provided in Azure ML Studio.

You'll then explore the Azure ML Software Development Kit (SDK) for Python and advance to creating experiments and publishing models using code. The book also guides you in optimizing your model's hyperparameters using Hyperdrive before demonstrating how to use responsible AI tools to interpret and debug your models. Once you have a trained model, you'll learn to operationalize it for batch or real-time inferences and monitor it in production.

By the end of this Azure certification study guide, you'll have gained the knowledge and the practical skills required to pass the DP-100 exam.

What you will learn

  • Create a working environment for data science workloads on Azure
  • Run data experiments using Azure Machine Learning services
  • Create training and inference pipelines using the designer or code
  • Discover the best model for your dataset using Automated ML
  • Use hyperparameter tuning to optimize trained models
  • Deploy, use, and monitor models in production
  • Interpret the predictions of a trained model

Who this book is for

This book is for developers who want to infuse their applications with AI capabilities and data scientists looking to scale their machine learning experiments in the Azure cloud. Basic knowledge of Python is needed to follow the code samples used in the book. Some experience in training machine learning models in Python using common frameworks like scikit-learn will help you understand the content more easily.

Table of contents

  1. Azure Data Scientist Associate Certification Guide
  2. Contributors
  3. About the authors
  4. About the reviewers
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Share Your Thoughts
  6. Section 1: Starting your cloud-based data science journey
  7. Chapter 1: An Overview of Modern Data Science
    1. The evolution of data science
    2. Working on a data science project
      1. Understanding of the business problem
      2. Acquiring and exploring the data
      3. Feature engineering
      4. Training the model
      5. Deploying the model
    3. Using Spark in data science
    4. Adopting the DevOps mindset
    5. Summary
    6. Further reading
  8. Chapter 2: Deploying Azure Machine Learning Workspace Resources
    1. Technical requirements
    2. Deploying Azure ML through the portal
      1. Using the deployment wizard
    3. Deploying Azure ML via the CLI
      1. Deploying Azure Cloud Shell
      2. Using the Azure CLI
      3. Installing the Azure ML CLI extension
      4. Deploying Azure ML using the az ml command
      5. Cleaning up the CLI resource group
    4. Alternative ways to deploy an Azure ML workspace
    5. Exploring the deployed Azure resources
      1. Understanding Role-Based Access Control (RBAC)
      2. RBAC inheritance
      3. Creating custom roles
      4. Assigning roles in the Azure ML workspace
    6. Summary
    7. Questions
    8. Further reading
  9. Chapter 3: Azure Machine Learning Studio Components
    1. Technical requirements
    2. Interacting with the Azure ML resource
    3. Exploring the Azure ML Studio experience
    4. Authoring experiments within Azure ML Studio
    5. Tracking data science assets in Azure ML Studio
    6. Managing infrastructure resources in Azure ML Studio
    7. Summary
  10. Chapter 4: Configuring the Workspace
    1. Technical requirements
    2. Provisioning compute resources
      1. Compute instances
      2. Compute clusters
      3. Inference clusters
      4. Attached compute
    3. Connecting to datastores
      1. Types of datastores
      2. Datastore security considerations
    4. Working with datasets
      1. Registering datasets
      2. Exploring the dataset
      3. Data drift detection
    5. Summary
    6. Questions
    7. Further reading
  11. Section 2: No code data science experimentation
  12. Chapter 5: Letting the Machines Do the Model Training
    1. Technical requirements
    2. Configuring an AutoML experiment
      1. Registering the dataset
      2. Returning to the AutoML wizard
    3. Monitoring the execution of the experiment
    4. Deploying the best model as a web service
      1. Understanding the deployment of the model
      2. Cleaning up the model deployment
    5. Summary
    6. Question
    7. Further reading
  13. Chapter 6: Visual Model Training and Publishing
    1. Technical requirements
    2. Overview of the designer
      1. The authoring screen/view
      2. Understanding the asset library
      3. Exploring the asset's inputs and outputs
    3. Building the pipeline with the designer
      1. Acquiring the data
      2. Preparing the data and training the model
      3. Executing the training pipeline
    4. Creating a batch and real-time inference pipeline
      1. Creating a batch pipeline
      2. Creating a real-time pipeline
    5. Deploying a real-time inference pipeline
    6. Summary
    7. Question
    8. Further reading
  14. Section 3: Advanced data science tooling and capabilities
  15. Chapter 7: The AzureML Python SDK
    1. Technical requirements
    2. Overview of the Python SDK
    3. Working in AzureML notebooks
    4. Basic coding with the AzureML SDK
      1. Authenticating from your device
      2. Working with compute targets
      3. Defining datastores
      4. Working with datasets
    5. Working with the AzureML CLI extension
    6. Summary
    7. Questions
    8. Further reading
  16. Chapter 8: Experimenting with Python Code
    1. Technical requirements
    2. Training a simple sklearn model within notebooks
    3. Tracking metrics in Experiments
      1. Tracking model evolution
      2. Using MLflow to track Experiments
    4. Scaling the training process with compute clusters
      1. Exploring the outputs and logs of a run
      2. Understanding execution environments
      3. Training the diabetes model on a compute cluster
      4. Utilizing more than a single compute node during model training
    5. Summary
    6. Questions
    7. Further reading
  17. Chapter 9: Optimizing the ML Model
    1. Technical requirements
    2. Hyperparameter tuning using HyperDrive
      1. Using the early termination policy
    3. Running AutoML experiments with code
    4. Summary
    5. Questions
    6. Further reading
  18. Chapter 10: Understanding Model Results
    1. Technical requirements
    2. Creating responsible machine learning models
    3. Interpreting the predictions of the model
      1. Training a loans approval model
      2. Using the tabular explainer
      3. Understanding the tabular data interpretation techniques
      4. Reviewing the interpretation results
    4. Analyzing model errors
    5. Detecting potential model fairness issues
    6. Summary
    7. Questions
    8. Further reading
  19. Chapter 11: Working with Pipelines
    1. Technical requirements
    2. Understanding AzureML pipelines
    3. Authoring a pipeline
      1. Troubleshooting code issues
    4. Publishing a pipeline to expose it as an endpoint
    5. Scheduling a recurring pipeline
    6. Summary
    7. Questions
    8. Further reading
  20. Chapter 12: Operationalizing Models with Code
    1. Technical requirements
    2. Understanding the various deployment options
    3. Registering models in the workspace
    4. Deploying real-time endpoints
      1. Understanding the model deployment options
      2. Profiling the model's resource requirements
      3. Monitoring with Application Insights
      4. Integrating with third-party applications
    5. Creating a batch inference pipeline
    6. Summary
    7. Questions
    8. Further reading
    9. Why subscribe?
  21. Other Books You May Enjoy
    1. Packt is searching for authors like you
    2. Share Your Thoughts

Product information

  • Title: Azure Data Scientist Associate Certification Guide
  • Author(s): Andreas Botsikas, Michael Hlobil
  • Release date: December 2021
  • Publisher(s): Packt Publishing
  • ISBN: 9781800565005