July 2017
Beginner to intermediate
378 pages
10h 26m
English
Analytical datasets are semi-denormalized tables. By semi-denormalized, this means including not just the ID code of a field but the description for it as well. You may also decide to create categories based on value ranges and include these as separate features.
The goal is to make life easy for your analysts more than a focus on efficiently storing values, as it would be with purely relational database design. You are, essentially, prebuilding the transformed datasets that an analyst would be building using SQL to preprocess a dataset in preparation to train an ML model anyway.
As data wrangling takes 80-95% of a typical analysts time, having most of this already built, tested, and the business logic incorporated ...