September 2018
Intermediate to advanced
412 pages
11h 12m
English
In the process of building models, the latest data often needs to be accessed for better model accuracy. Typically, in such scenarios, running the analysis on production data is an overhead for any production system that necessitates the setting up of an offline data cluster that is in sync with the real production data cluster, with commodity hardware, and with eventual consistency. This is so that the analysts can run the data analysis on the warm data for the purpose of long-running analysis and simulations for building a machine learning model.