Book description
When data-driven applications fail, identifying the cause is both challenging and time-consuming—especially as data pipelines become more and more complex. Hunting for the root cause of application failure from messy, raw, and distributed logs is difficult for performance experts and a nightmare for data operations teams. This report examines DataOps processes and tools that enable you to manage modern data pipelines efficiently.
Author Ted Malaska describes a data operations framework and shows you the importance of testing and monitoring to plan, rebuild, automate, and then manage robust data pipelines—whether it’s in the cloud, on premises, or in a hybrid configuration. You’ll also learn ways to apply performance monitoring software and AI to your data pipelines in order to keep your applications running reliably.
You’ll learn:
- How performance management software can reduce the risk of running modern data applications
- Methods for applying AI to provide insights, recommendations, and automation to operationalize big data systems and data applications
- How to plan, migrate, and operate big data workloads and data pipelines in the cloud and in hybrid deployment models
Table of contents
- 1. Introduction
-
2. How We Got Here
- Excel Spreadsheets
- Databases
- Appliances
- Extract, Transform, and Load Platforms
- Kafka, Spark, Hadoop, SQL, and NoSQL platforms
- Cloud, On-Premises, and Hybrid Environments
- Machine Learning, Artificial Intelligence, Advanced Business Intelligence, Internet of Things
- Producers and Considerations
- Consumers and Considerations
- Summary
- 3. The Data Ecosystem Landscape
- 4. Data Processing at Its Core
- 5. Identifying Job Issues
- 6. Identifying Workflow and Pipeline Issues
- 7. Watching and Learning from Your Jobs
- 8. Closing Thoughts
Product information
- Title: Rebuilding Reliable Data Pipelines Through Modern Tools
- Author(s):
- Release date: July 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492058168
You might also like
book
Architecting Data-Intensive Applications
Architect and design data-intensive applications and, in the process, learn how to collect, process, store, govern, …
book
Data Engineering on Azure
In Data Engineering on Azure you’ll learn the skills you need to build and maintain big …
book
Designing Data-Intensive Applications
Data is at the center of many challenges in system design today. Difficult issues need to …
book
Analytical Skills for AI and Data Science
While several market-leading companies have successfully transformed their business models by following data- and AI-driven paths, …