Assessing the Adequacy of a Big Data Resource
Abstract
Before the data analyst devotes time and energy to a data resource, he or she must determine whether the data is likely to be accurate, comprehensive, representative, whether it has been organized sensibly, and whether it has been provided with adequate annotation. The first step usually involves looking at all of the data or at least looking at a large number of data samples. This chapter describes some basic approaches to examining Big Data resources.
Keywords
ASCII editor; Data assessment; Comprehensive data; Representative data; Data flattening; Full access to data
Get Principles and Practice of Big Data, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.