4

Working with Databases

In this chapter, we are going to look at how to work with relational databases. Databases remain one of the most common sources that the data pipeline reads data from and writes to, so it is important that we understand how to work with them efficiently. We will start off by looking at the Spark API and then create a simple database library that provides a simple interface to work with databases.

Specifically, we will look at the following topics

  • Understanding the Spark JDBC API
  • Working with the Spark JDBC API
  • Loading the database configuration
  • Creating a database interface
  • Performing various database operations

Technical requirements

We are going to use a mysql database for the examples in this chapter. If you have ...

Get Data Engineering with Scala and Spark now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.