Chapter 5. Setting Up Effective Development Environments

Just like any other software system, data pipelines require development and testing environments as part of the software development lifecycle. With the combination of cloud services, data sources, sinks, and other dependencies, environments for data pipelines have a lot of moving parts that can be costly and confusing to juggle.

In this chapter, you’ll see how to create effective development environments, from techniques for local development to advice for setting up test and staging tiers that prepare pipeline changes for production.

The chapter opens with an overview of the differences between data environments and software environments and how to bring these concepts together to create environment tiers for data pipelines. You’ll see how to plan out these environments while balancing cost, complexity, and functional needs with the needs of development, testing, and data consumers.

The second part of the chapter focuses on the design of local development environments and includes best practices to help you get the most out of containers and avoid common pitfalls.

While the term local development implies an environment that runs exclusively on a developer’s machine, the reality of working with data pipelines and cloud services is that you may have to connect to external resources in local development. To help reduce these costs and complexities, you’ll see strategies for limiting dependence on external services.

For those ...

Get Cost-Effective Data Pipelines now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.