Skip to Content
Azure Databricks, Pandas, and Opendatasets
video

Azure Databricks, Pandas, and Opendatasets

by Alfredo Deza, Noah Gift
May 2022
Advanced
8m
English
Pragmatic AI Labs

Overview

Azure Databricks with Pandas and Open Datasets

Find out how to get a working cluster with Databricks using Azure and then use the full Pandas API operating in the cluster with Open Datasets and a Python Jupyter Notebook. This video will walk you through creating a workspace in Azure to create the Databricks service, then create the cluster that comes with the Pyspark Pandas API, and finally import the open datasets into the cluster.

Although straightforward to create a Databricks cluster with Azure, it is a bit more involved to run a Python Jupyter Notebook that has Azure ML Open Datasets installed and availabe in the cluster along with the ability to use the full Pandas API you are used to working with and taking advantage of the clustering capabilities from Databricks.

By the end of this video you will understand how to:

  • Create an Azure Databricks service and workgroup
  • Select Pyspark version to support Pandas API
  • Import the azureml-opendatasets PyPI package and install it in clusters
  • Run a Jupyter Notebook and attach it to a running cluster
  • Verify that the Pyspark Pandas API is available along with the azureml-opendatasets package
Useful Resources
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Data Engineering with Python and AWS Lambda LiveLessons

Data Engineering with Python and AWS Lambda LiveLessons

Noah Gift, Robert Jordan, Kennedy Behrman

Publisher Resources

ISBN: 50132VIDEOPAIMLOtherOtherOther