Understanding the data
As already mentioned, understanding data is interleaved with data preparation. In order to know what to do, which variables need recoding, which variables have missing values, and how to combine variables into a new one, you need to deeply understand the data you are dealing with. You can get this understanding with a simple overview of the data, which might be a method good enough for small datasets, or a method for checking just a small subset of a large dataset.
You can get more information about the distribution of the variables by showing the distributions graphically. Basic statistical methods are also useful for data overview. Finally, sometimes these basic statistical results and graphs are already exactly what you ...
Get SQL Server 2016 Developer's Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.