Chapter 5. Cleaning Your Data
Who made this mess? Let’s clean it up! Cleaning your data should not be as hard as cleaning the room of a 5-year-old who loves paint and LEGO bricks (trust me, it’s not as easy as you think). For those of us who have spent enough time working with data, we know it can get messy, dirty, and downright unusable. Data is like a lot of things in life—it’s fragile and requires a lot of care and attention. I believe it shouldn’t be a hard task to provide that care and attention, and that’s primarily because Designer makes it so simple and straightforward.
Before we get started, please remember that each chapter in this book is additive. You will build on your knowledge of tools and techniques as you go through the book. You will learn things in this chapter that will help you in nearly every other chapter going forward. If you get to a new chapter and feel a bit overwhelmed, feel free to roll back and reread, think, and practice more. The goal, as I’ve stated, is to turn you into an absolute, bona fide Alteryx rock star. Let’s do this.
When we talk about having “clean data,” what do we mean, exactly? We are talking about cleaning up all the nuances in our data that don’t help us tell the story we are trying to tell. Usually, when I refer to “clean data,” I am referring to five specific factors:
We don’t have null values.
We don’t have missing values.
We don’t have duplicative data.
We don’t have incorrect or “bad” values (bad is subjective, of course). ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access