Summary
In this chapter, we first learned how to separate logic from the Spark engine. We then looked at a component that was well-tested in separation without the Spark engine, and we carried out integration testing using SparkSession. For this, we created a SparkSession test by reusing the component that was already well-tested. By doing that, we did not have to cover all edge cases in the integration test and our test was much faster. We then learned how to leverage partial functions to supply mocked data that's provided at the testing phase. We also covered ScalaCheck for property-based testing. By the end of this chapter, we had tested our code in different versions of Spark and learned how to change our DataFrame mock test to RDD.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access