Chapter 7: Using Python Libraries in Azure Databricks

Azure Databricks has implementations on different programming languages, but we will focus on Python developers, therefore we will explore all the nuances regarding working with it, as well as introducing core concepts regarding models and data that later will be studied in more detail.

In this chapter, we will cover the following:

  • Installing popular Python libraries in Azure Databricks
  • Learning key concepts of the PySpark API
  • Using the Koalas API to manipulate data in a similar way as we would do with pandas
  • Using visualization libraries to make plots and graphics

These concepts will be introduced more deeply in the next sections of this chapter.

Technical requirements

This chapter will ...

Get Distributed Data Systems with Azure Databricks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.