Skip to Content
How Can I Clean My Data for Use in a Predictive Model?
on-demand course

How Can I Clean My Data for Use in a Predictive Model?

with Matthew North
May 2017
Beginner to intermediate
5m
English
Infinite Skills

Overview

Garbage In/Garbage Out applies to more than just manufacturing. Dirty data can doom your predictive analytics project from the very start! In this video, Matt North will show you how to identify flaws, such as statistical outliers and missing values, to improve the usefulness and reliability of your results.

Using RapidMiner, Matt starts by importing a data set and examining it to ensure that it is importing correctly with the right data types. You will learn how to quickly identify outliers and missing values; and take steps to correct those problems in the data using filters on your data import. Business and data analysts that are using data for predictive modeling will find these techniques useful. A basic understanding of statistics and data organization/representations will help you get the most out of this video.

  • learn how to identify and handle missing values on data imports in RapidMiner.
  • learn to identify and handle statistical outliers in RapidMiner.
  • understand techniques for evaluating data quality.
Matt North is a professor of Information Systems at Utah Valley University, where he teaches courses on data analytics and database development, administration and security. He holds degrees from BYU, Utah State University and West Virginia University. He served as a Fulbright appointee at Universidad Tecnológica Nacional in Argentina, and is the recipient of the International Association for Computer Information Systems’ Ben Bauman Award for Excellence, and the Gamma Sigma Alpha Outstanding Professor Award. He is the author of numerous articles, published papers and book chapters, in addition to his two books: "Data Mining for the Masses", and "Life Lessons and Leadership"

Other videos in this series:

Does Correlation Prove Causation in Predictive Analytics?
How Do I Choose the Correct Predictive Model for My Organizational Questions?
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Watch now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

Beginning Data Analytics with RapidMiner

Beginning Data Analytics with RapidMiner

Matthew North
What Successful Project Managers Do

What Successful Project Managers Do

W. Scott Cameron, Jeffrey S. Russell, Edward J. Hoffman, Alexander Laufer

Publisher Resources

ISBN: 9781491990872