Engineers have a saying: “You can only control what you can measure.” When a scientist or engineer performs a small-scale experiment in her own laboratory, she can calibrate all of her instruments, make her own measurements, check and cross-check her measurements with different instruments, compare her results with the results from other laboratories, and repeat her measurements until she is satisfied that the results are accurate and valid. When a scientist or engineer draws information from a Big Data resource, none of her customary safeguards apply. In almost all cases the experiment is too big and too complex to repeat. This chapter discusses a variety of techniques that Big Data analysts use to achieve some level ...

Get Principles and Practice of Big Data, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.