During preparation, raw data is made ready for exploration. This preparation is often a very interesting process. It is very frequently the case that data from is fraught with all kinds of issues related to quality. You will likely spend a lot of time dealing with these quality issues, and often this is a very non-trivial amount of time.
Why? Well there are a number of reasons:
- The data is simply incorrect
- Parts of the dataset are missing
- Data is not represented using measurements appropriate for your analysis
- The data is in formats not convenient for your analysis
- Data is at a level of detail not appropriate for your analysis
- Not all the fields you need are available from a single source
- The representation of data differs depending ...