Model building

A model is a representation of things, a rendering or description of reality. Just like a model of a physical building, data science models attempt to make sense of the reality; in this case, the reality is the underlying relationships between the features and the predicted variable. They may not be 100 percent accurate, but still very useful to give some deep insights into our business space based on the data.

There are several machine learning algorithms that help us model data and Spark provides many of them out of the box. However, which model to build is still a million dollar question. It depends on various factors, such as interpretability-accuracy trade-off, how much data you have at hand, categorical or numerical variables, ...

Get Spark for Data Science now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.