Book description
When data-driven applications fail, identifying the cause is both challenging and time-consuming—especially as data pipelines become more and more complex. Hunting for the root cause of application failure from messy, raw, and distributed logs is difficult for performance experts and a nightmare for data operations teams. This report examines DataOps processes and tools that enable you to manage modern data pipelines efficiently.
Author Ted Malaska describes a data operations framework and shows you the importance of testing and monitoring to plan, rebuild, automate, and then manage robust data pipelines—whether it’s in the cloud, on premises, or in a hybrid configuration. You’ll also learn ways to apply performance monitoring software and AI to your data pipelines in order to keep your applications running reliably.
You’ll learn:
- How performance management software can reduce the risk of running modern data applications
- Methods for applying AI to provide insights, recommendations, and automation to operationalize big data systems and data applications
- How to plan, migrate, and operate big data workloads and data pipelines in the cloud and in hybrid deployment models
Table of contents
- 1. Introduction
-
2. How We Got Here
- Excel Spreadsheets
- Databases
- Appliances
- Extract, Transform, and Load Platforms
- Kafka, Spark, Hadoop, SQL, and NoSQL platforms
- Cloud, On-Premises, and Hybrid Environments
- Machine Learning, Artificial Intelligence, Advanced Business Intelligence, Internet of Things
- Producers and Considerations
- Consumers and Considerations
- Summary
- 3. The Data Ecosystem Landscape
- 4. Data Processing at Its Core
- 5. Identifying Job Issues
- 6. Identifying Workflow and Pipeline Issues
- 7. Watching and Learning from Your Jobs
- 8. Closing Thoughts
Product information
- Title: Rebuilding Reliable Data Pipelines Through Modern Tools
- Author(s):
- Release date: July 2019
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781492058168
You might also like
book
Unlock Complex and Streaming Data with Declarative Data Pipelines
Unlocking the value of modern data is critical for data-driven companies. This report provides a concise, …
article
From ChatGPT to HackGPT: Meeting the Cybersecurity Threat of Generative AI
Emerging generative AI technologies such as ChatGPT are putting new tools in the hands of hackers. …
article
Run Llama-2 Models Locally with llama.cpp
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …
article
Use Github Copilot for Prompt Engineering
Using GitHub Copilot can feel like magic. The tool automatically fills out entire blocks of code--but …