July 2017
Intermediate to advanced
796 pages
18h 55m
English
As already stated, in the pre-Spark era, big data modelers typically used to build their ML models using statistical languages such as R, STATA, and SAS. However, this kind of workflow (that is, the execution flow of these ML algorithms) lacks efficiency, scalability, and throughput, as well as accuracy, with, of course, extended execution times.
Then, data engineers used to reimplement the same model in Java, for example, to deploy on Hadoop. Using Spark, the same ML model can be rebuilt, adopted, and deployed, making the whole workflow much more efficient, robust, and faster, allowing you to provide hands-on insight to increase the performance. Moreover, implementing these algorithms in Hadoop means that ...
Read now
Unlock full access