O'Reilly logo
live online training icon Live Online training

Getting in Front on Data Quality

Data

Thomas Redman

Everyone knows high-quality data is important. After all, “garbage-in, garbage-out.” Bad data harms operations and decision-making, and it is absolutely catastrophic when it comes to machine learning. But few companies address data quality properly. They tend to focus on finding and fixing errors rather than finding ways to prevent errors, so their efforts are time-consuming, expensive, and often don’t work very well.

Fortunately, there is a better way: getting in front of the issues by finding and eliminating root causes of error. This simple change works, leading to a reduction of 90 to 99 percent of future errors.

It takes a special person to provoke a change in approach, from fixing errors to preventing them. We call those who succeed “data provocateurs.” This hands-on workshop describes a four-step process for earning that title.

The first step involves determining “how good the data really is” and we will describe a simple, powerful method--called the Friday Afternoon Measurement--which can be applied by almost anyone. Armed with the facts, provocateurs can make needed improvements.

We’ll also address steps 2 (understanding customer needs) and 3 (completing an improvement project).

The fourth step involves becoming a role model of the behaviors needed. We'll learn about the four basic responsibilities for data quality and ask attendees to “hold up a mirror,” honestly evaluating their current status with respect to these roles and actions.

What you'll learn-and how you can apply it

By the end of this live, hands-on, online course, you’ll understand:

  • Why high-quality data is so important
  • Why preventing new errors, instead of simply cleaning up existing ones, is so essential
  • What it takes to provoke the needed changes

And you’ll be able to:

  • Make a simple data quality measurement
  • Identify opportunities for improving data quality
  • Focus on the most important data
  • Assume basic responsibilities for data quality

This training course is for you because...

  • Your success on-the-job depends on data.
  • Your work informs other decision-makers in your organization, who rely on you to provide accurate and actionable data.
  • You are a data scientist and you want to increase the probability that your analytics, machine learning, and artificial intelligence programs succeed.

Prerequisites

  • None
  • An Excel spreadsheet for use with exercises will be provided during the course.

Recommended follow-up:

About your instructor

  • Dr. Thomas C. Redman, “the Data Doc,” President of Data Quality Solutions, helps start-ups and multinationals; senior executives, Chief Data Officers, and leaders buried deep in their organizations, chart their courses to data-driven futures, with special emphasis on quality and analytics. Tom’s most important article is “Data’s Credibility Problem ”(Harvard Business Review, December 2013) He has a Ph.D. in Statistics and two patents.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Introduction (10 minutes)

The Business Case for Data Quality (25 minutes)

  • Presentation: Summary
  • Presentation: An Example of What’s Possible
  • Presentation: “The Stephanie Vignette” shows how people, doing what appears to be the right thing, actually subvert data quality efforts.
  • Discussion: Do you, unconsciously perhaps, behave like Stephanie?
  • Q&A

The “data provocateur?” (20 minutes)

  • Presentation: The Data Provocateur, defined
  • Presentation: Four-step process
  • Discussion: Do you have what it takes to become a provocateur?
  • Q&A
  • Break (10 minutes)

Step 1: The Friday Afternoon Measurement (60 minutes)

  • Presentation: The FAM protocol
  • Exercise/Discussion: Completing a FAM
  • Presentation: The high cost of poor quality data
  • Discussion: Can you make a FAM within the context of your current role?
  • Q&A
  • Break (10 minutes)

Step 2: A basic overview of Understanding Customer Needs (5 min)

Step 3: A basic overview of Completing an Improvement Project (5 min)

Step 4: Your Basic Responsibilities for Data Quality (30 min)

  • Presentation: Introductions to The Process for Understanding Customer Needs (Step 2) and The Quality Improvement Cycle (Step 3).
  • Presentation: Your Basic Responsibilities for Data Quality (Step 4)
  • Exercise/Discussion: Are you Meeting Your Responsibilities Today?

Final Summary/Q&A (5 minutes)