4 Managing data

This chapter covers

  • Fixing data quality problems
  • Transforming data before modeling
  • Organizing your data for the modeling process

In chapter 3, you learned how to explore your data and how to identify common data issues. In this chapter, you’ll see how to fix the data issues that you’ve discovered. After that, we’ll talk about transforming and organizing the data for the modeling process. Most of the examples in this chapter use the same customer data that you used in the previous chapter.[1]

As shown in the mental model (figure 4.1), this chapter again emphasizes the science of managing the data in a statistically valid way, prior to the model-building step.

Figure 4.1. Chapter 4 mental model

4.1. Cleaning data

In this section, ...

Get Practical Data Science with R, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.