Ten (or So) Best Practices in Data Preparation
In This Chapter
Understanding the key steps in data validation
Preparing data for analysis
The main goal of this book is to get you familiar with the statistical methods that allow you to build useful statistical models. But as you’ve probably noticed, we have spent a great deal of time, particularly in Part II, talking about getting data ready for analysis. Statistical software packages are extremely powerful these days, but they cannot overcome poor quality data. This chapter provides a checklist of things you need to do before you go off building statistical models.
Check Data Formats
Your analysis always starts with a raw data file. Raw data files come in many different shapes and sizes. Mainframe data is different than PC data, spreadsheet data is formatted differently than web data, and so forth. And in the age of big data, you will surely be faced with data from a variety of sources. Your first step in analyzing your data is making sure you can read the files you’re given. Chapter 7 gives some tips about how to do this.
Chapter 6 talks about the formats of the individual data fields, or variables, in your data file. You need to actually look at what each field contains. For example, it’s not wise to trust that ...