Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you will learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available in the framework of the data engineering lifecycle.
Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You will understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, governance, and deployment that are critical in any data environment regardless of the underlying technology.
This book will help you:
- Assess data engineering problems using an end-to-end data framework of best practices
- Cut through marketing hype when choosing data technologies, architecture, and processes
- Use the data engineering lifecycle to design and build a robust architecture
- Incorporate data governance and security across the data engineering lifecycle
Table of contents
1. Data Engineering Described
- What is data engineering?
- Data Engineering Skills and Activities
- Data Engineers Inside an Organization
- Summary
- Further reading
- Links
2. The Data Engineering Lifecycle
- What is the Data Engineering Lifecycle?
- The major undercurrents across the data engineering lifecycle
- Summary
- Further Reading
- Further watching
3. Designing Good Data Architecture
- What is Data Architecture?
- Major Architecture Concepts
- Examples & Types of Data Architecture
- Who’s Involved with Designing a Data Architecture?
- Summary
- Further reading
4. Choosing technologies across the Data Engineering Lifecycle
- Team size and capabilities
- Speed to market
- Interoperability
- Cost optimization and business value
- Today vs. the future - Immutable vs. Transitory Technologies
- Location: On-Prem, Cloud, Hybrid, Multi-Cloud, and more
- Build vs. Buy
- Monolith vs. Modular
- Serverless vs. Infrastructure
- Undercurrents and how they impact choosing technologies
- Optimization, performance, and the benchmark wars
- Summary
5. Ingestion
- What Is Data Ingestion?
- Key Engineering Considerations for the Ingestion Phase
- Batch Ingestion Patterns
- Streaming Ingestion Patterns
- Ingestion Technologies
- Who You’ll Work With
- Undercurrents
- Conclusion
- Title: Fundamentals of Data Engineering
- Author(s):
- Release date: September 2022
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098108304
