Mocking data sources using partial functions

In this section, we will cover the following topics:

  • Creating a Spark component that reads data from Hive
  • Mocking the component
  • Testing the mock component

Let's assume that the following code is our production line:

 ignore("loading data on prod from hive") { UserDataLogic.loadAndGetAmount(spark, HiveDataLoader.loadUserTransactions) }

Here, we are using the UserDataLogic.loadAndGetAmount function, which needs to load our user data transaction and get the amount of the transaction. This method takes two arguments. The first argument is a sparkSession and the second argument is the provider of sparkSession, which takes SparkSession and returns DataFrame, as shown in the following example:

object ...

Get Hands-On Big Data Analytics with PySpark now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.