Chapter 3: Dirty Data
Statistical Assumptions of Patterns of Missing
Conventional Correction Methods
General First Steps on Receipt of a Data Set
Introduction
Dirty data refers to fields or variables within a data set that are erroneous. Possible errors could range from spelling mistakes, incorrect values associated with fields or variables, or simply missing or blank values. Most real-world data sets have some degree of dirty data. As shown in Figure 3.1, dealing with dirty data is one of the multivariate data discovery steps.
In some situations (for example, when the original data source can be obtained), ...
Get Fundamentals of Predictive Analytics with JMP, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.