Everyone knows the expression “Garbage in, garbage out.” Data quality is clearly a vital topic if you hope to get value from data. Though it seems obvious, the concept is difficult to pin down. Quality is generally taken to mean “fitness for purpose.”1 Because data can be used for so many reasons and in so many different contexts, once we get beyond a few general principles, it can be difficult to give guidance that is not specific to the case in point.
Consequently, this chapter differs from the ones that follow in that we don't give a single, end-to-end narrative. Instead we examine two small case studies that show some of the capabilities and features of JMP that are useful in addressing data quality, and which you are likely to need in other situations when addressing the quality of your own data.
The data sets used in this chapter are available at http://support.sas.com/visualsixsigma.
There are numerous frameworks for assessing data quality. Probably one of the simplest uses the dimensions2 of:
In the enterprise setting, data quality is usually considered a mature topic. The investments required to build and support the large-scale IT systems that can deliver the promise of what SAS calls “The Power to Know™” necessarily imply a high degree of repetition, and the end result is often a suite of fairly simple reports or data tailored to the needs ...