November 2023
Beginner to intermediate
496 pages
14h 25m
English
Even with the best and most sophisticated algorithm in the world, the outcome of any analysis will be worthless if the input data is incorrect, skewed or otherwise invalid. In Big Data, the old saying of garbage in, garbage out is very appropriate. In Big Data environments, the volumes of data are large, and so is the possibility for mistakes.
Most Big Data sets contain data that is duplicate, erroneous or even plain missing. Frequently, data sets have been built up over years and years, during which time fields in databases have changed, were renamed or simply ignored. As a result, most organizations have databases and data sets that are far from perfect in terms ...
Read now
Unlock full access