Introducing Spark andKafka | 139
Let us take a look at the competition. On one side, you have Matlab and R which have the
benefit of being fairly easy to use, but they are less scalable. On the other side, there is Mahout
and GraphLab, which are more scalable but at the cost of ease.
The ML pipelines were officially introduced into the Spark package as an attempt to simplify
machine learning, embracing machine learning’s flow of loading data, extracting features, train-
ing the data and testing that trained data. All through that pipeline, a standard interface allows
tuning, testing and early failure detection.
The ML algorithms help spam filtering, fraud detection or even recommendation analy-
sis. Anabundance of use cases are also at the ...