Book description
When data-driven applications fail, identifying the cause is both challenging and time-consuming—especially as data pipelines become more and more complex. Hunting for the root cause of application failure from messy, raw, and distributed logs is difficult for performance experts and a nightmare for data operations teams. This report examines DataOps processes and tools that enable you to manage modern data pipelines efficiently.
Author Ted Malaska describes a data operations framework and shows you the importance of testing and monitoring to plan, rebuild, automate, and then manage robust data pipelines—whether it’s in the cloud, on premises, or in a hybrid configuration. You’ll also learn ways to apply performance monitoring software and AI to your data pipelines in order to keep your applications running reliably.
You’ll learn:
- How performance management software can reduce the risk of running modern data applications
- Methods for applying AI to provide insights, recommendations, and automation to operationalize big data systems and data applications
- How to plan, migrate, and operate big data workloads and data pipelines in the cloud and in hybrid deployment models
Table of contents
- 1. Introduction
-
2. How We Got Here
- Excel Spreadsheets
- Databases
- Appliances
- Extract, Transform, and Load Platforms
- Kafka, Spark, Hadoop, SQL, and NoSQL platforms
- Cloud, On-Premises, and Hybrid Environments
- Machine Learning, Artificial Intelligence, Advanced Business Intelligence, Internet of Things
- Producers and Considerations
- Consumers and Considerations
- Summary
- 3. The Data Ecosystem Landscape
- 4. Data Processing at Its Core
- 5. Identifying Job Issues
- 6. Identifying Workflow and Pipeline Issues
- 7. Watching and Learning from Your Jobs
- 8. Closing Thoughts
Product information
- Title: Rebuilding Reliable Data Pipelines Through Modern Tools
- Author(s):
- Release date: July 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492058168
You might also like
book
Unlock Complex and Streaming Data with Declarative Data Pipelines
Unlocking the value of modern data is critical for data-driven companies. This report provides a concise, …
video
Case Study: How California State University used DataOps Principles to Build Data Pipelines for Rapid Deployment and Scalability
Though we have in the past depended on traditional data warehouses to drive business intelligence from …
book
Cost-Effective Data Pipelines
The low cost of getting started with cloud services can easily evolve into a significant expense …
video
Building Data Pipelines with Python
This course shows you how to build data pipelines and automate workflows using Python 3. From …