O'Reilly logo
live online training icon Live Online training

Data lineage monitoring essentials

Learn how to trace your data and monitor its quality in real time

Topic: Data
Andy Petrella
Sammy El Khammal

Data lineage and data quality are emerging subjects in data-driven businesses. These methods help you better understand and govern your data processes, using lineage monitoring to detect potential impacts of misused data or defective processes.

Experts Andy Petrella and Sammy El Khammal show you how to apply data lineage and data quality monitoring to real-world use cases and walk you through examples of common business failures—and how a good data intelligence strategy could help you prevent them. Join in to learn step-by-step how to arm your project with painless systematic data traceability, quality monitoring, and data governance. You’ll discover how to boost the efficiency of your data team and use data intelligence supervision to enhance the global quality and productivity of your work.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Best practices for data intelligence, monitoring, and data lineage tracing
  • How proper data lineage allows you to monitor data quality
  • How to apply data quality in the context of data intelligence
  • Model and machine learning governance basics

And you’ll be able to:

  • Monitor an end-to-end data process
  • Apply governance guidelines to your own projects
  • Control the quality of your data pipelines end to end
  • Explain to your collaborators the importance and the challenges of data quality

This training course is for you because...

  • You’re a data scientist or data engineer worried about stretching your working standards.
  • You’re a project or product manager worried about monitoring your projects.
  • You work with a data team and want to communicate more effectively.
  • You’re looking for automated documentation about your data projects.
  • You’re looking for ways to improve your performance and governance standards beyond compliance.

Prerequisites

  • An intermediate understanding of how a data project works

Recommended preparation:

About your instructors

  • Andy is an entrepreneur with Mathematics and Geospatial data analysis background focused on unleashing unexploited business potentials leveraging new technologies in machine learning, artificial intelligence and cognitive systems. In the open source community, Andy has been known for its Spark Notebook project bridging distributed data science gap with the Scala ecosystem.

    Andy is the CEO of Kensu Inc., an Analytics and AI Governance company, which created the Kensu Data Activity Manager (DAM), the first of its kind GCP (Governance, Compliance & Performance) Solution for Data Science. DAM automatically and in real-time creates the data mapping across tools and teams to be the one-stop shop for DPO and data managers for all aspects of GCP.

  • Sammy El Kammal is a young graduate of the QTEM masters network, where he combined his passion for data science and finance to participate in various business and data challenges around the world. Now that his time as a student is over, he continues to support students by accompanying them in their data challenges.

    In 2019, Sammy turned his interests into a career and joined Kensu as a field solution engineer to help customers maintain high quality standards in their processes and machine learning models. He shares his love for sustainable data science and data intelligence with various customers, from the financial sector to the automotive industry.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Data lineage and quality monitoring: An overview (65 minutes)

  • Group discussion: The challenges of a data team; how to relieve the pain
  • Presentation: A literature review on data team challenges
  • Q&A

Break (5 minutes)

Practical data lineage (45 minutes)

  • Presentation: The project and principles of data lineage collection
  • Demo: Modeling and reporting a project from A to Z
  • Q&A

Break (5 minutes)

Understanding the power of data lineage (50 minutes)

  • Presentation: Other advantages of data lineage
  • Hands-on exercise: Explore use cases and simulation of different business failures

Wrap-up and Q&A (10 minutes)