Book description
Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack.
You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions.
You'll learn:
- What a data pipeline is and how it works
- How data is moved and processed on modern data infrastructure, including cloud platforms
- Common tools and products used by data engineers to build pipelines
- How pipelines support analytics and reporting needs
- Considerations for pipeline maintenance, testing, and alerting
Publisher resources
Table of contents
- Preface
- 1. Introduction to Data Pipelines
- 2. A Modern Data Infrastructure
- 3. Common Data Pipeline Patterns
- 4. Data Ingestion: Extracting Data
- 5. Data Ingestion: Loading Data
- 6. Transforming Data
- 7. Orchestrating Pipelines
- 8. Data Validation in Pipelines
- 9. Best Practices for Maintaining Pipelines
- 10. Measuring and Monitoring Pipeline Performance
- Index
Product information
- Title: Data Pipelines Pocket Reference
- Author(s):
- Release date: February 2021
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492087830
You might also like
book
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
Through a series of recent breakthroughs, deep learning has boosted the entire field of machine learning. …
book
Introducing MLOps
More than half of the analytics and machine learning (ML) models created by organizations today never …
book
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw …
book
Practical Statistics for Data Scientists, 2nd Edition
Statistical methods are a key part of data science, yet few data scientists have formal statistical …