Skip to Content
View all events

Introduction to Data Cleaning in Excel

Published by O'Reilly Media, Inc.

Beginner content levelBeginner

Identify and fix dirty data

Data professionals are expected to be able to clean data, but few are trained to do it. They may spend hours every month cleaning and fixing the same data over and over because the data has been inputted incorrectly or has been patched so often that it may collapse under the weight of another change. This comes at a significant cost to organizations in wasted hours and decreased profitability, and it could also have an adverse impact on decision-making processes.

Join expert Susan Walsh to learn a better way to ensure clean data, with technology no more complicated than Microsoft Excel. By using this software mainstay, you’ll get to know your data better, spot inaccuracies more quickly, and spare yourself the problems dirty data can cause. Through live walk-throughs, practice datasets, and the opportunity to ask questions and get answers, you’ll be equipped with the tools to accurately and efficiently clean and check your data, all from the comfort of your spreadsheet.

What you’ll learn and how you can apply it

By the end of this live online course, you’ll understand:

  • Why data needs to be cleaned
  • The effects of dirty data
  • How to clean data effectively

And you’ll be able to:

  • Clean personal data, including names, addresses, and telephone numbers
  • Normalize supplier data
  • Spot-check data for duplicates, near duplicates, and errors

This live event is for you because...

  • You want to improve the accuracy of your data.
  • You want to improve your cleaning processes.
  • You want to learn how to clean data without coding.

Prerequisites

  • A computer with Microsoft Excel installed
  • A basic understanding of Excel (how to apply a formula, using shortcuts like “Ctrl-C” and “Ctrl-V,” and working across multiple tabs)

Recommended preparation:

  • Download the test file being used for the session (TBA)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Ensuring data accuracy (20 minutes)

  • Presentation: The impact of dirty data; tips on accuracy; the principles of COAT
  • Group discussion: Your data cleaning training; data quality issues in your organization; your management of data quality

Cleaning normalized suppliers (55 minutes)

  • Presentation: What is normalization?; normalization best practice; removing suffixes, punctuation, and spaces; the risky list; how to remove suffixes on the risky list
  • Group discussion: Do you normalize your data?
  • Hands-on exercise: Remove suffixes from the risky list
  • Q&A
  • Break

Cleaning names (55 minutes)

  • Group discussion: Your biggest issue with customer data (names, email, addresses, phone numbers)
  • Presentation: Why names need to be cleaned; formatting names; splitting names using Text to Columns; checking names
  • Hands-on exercise: Split names with Text to Columns and CONCATENATE; check names
  • Q&A
  • Break

Cleaning addresses (85 minutes)

  • Group discussion: Your company standard for address formats
  • Presentation: Why addresses need cleaning; formatting addresses; checking cleaned addresses
  • Hands-on exercise: Format addresses using Text to Columns and CONCATENATE
  • Q&A
  • Break

Cleaning and checking phone numbers (20 minutes)

  • Group discussion: Are incorrectly formatted phone numbers the devil?
  • Presentation: Formatting numbers; how to format phone numbers

Wrap-up and Q&A (5 minutes)

Your Instructor

  • Susan Walsh

    With a decade of experience fixing your dirty data, Susan Walsh is the Classification Guru. Susan is Founder and Managing Director of The Classification Guru Ltd, a specialist data classification, taxonomy customization and data cleansing consultancy. She is an industry thought leader, TEDx speaker and author of the soon-to-be-published ‘Between the Spreadsheets: Classifying and Fixing Dirty Data’. She’s also the creator of COAT.

    Susan has developed a methodology to accurately and efficiently classify, cleanse and check data for errors which will help prevent costly mistakes. This could save days of laborious cleansing and classifying and can help your business find cost savings through spend and time management - supporting better, more informed business decisions.

    Susan brings clarity and accuracy to data and procurement; helps teams work more effectively and efficiently; and cuts through the jargon to address the issues of dirty data and its consequences in an entertaining and engaging way.

    Susan is passionate about helping you find the value in cleaning your ‘dirty data’ and raises awareness of the consequences of ignoring issues through her blogs, vlogs, webinars and speaking engagements.

    linkedinXlinksearch

Skill covered

Microsoft Excel