Chapter 8. Mocks

As illustrated in Chapter 7, it’s desirable to replace data pipeline dependencies when unit testing. This helps reduce cloud costs, as you aren’t using resources or quotas while testing, and it expands test coverage. In addition to making it easier to run tests in CI, this approach can provide better test coverage versus using live services.

With the different types of dependencies in data pipelines, creating mocks can feel like peeling an onion. Maybe you just created a mock for unit-testing code that acquires data from an API, and now you are back on Stack Overflow looking for advice on how to mock interactions with cloud storage. It’s not that mocking is difficult; it’s the variety of interfaces data pipelines interact with that can make this endeavor challenging.

This chapter eliminates the onion peeling by consolidating techniques for replacing common data pipeline dependencies in one place. Starting with advice on how to evaluate test double placement and efficacy, you’ll see how to build mocks for generic interfaces and cloud services using common Python modules and CSP client mock libraries. The last technique is to use test databases for situations where you need to test code that manipulates database objects.

Get Cost-Effective Data Pipelines now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.