This book compared three data sets on the four supervised machine-learning algorithms that H2O offers:
In each case I first tried them on the default settings, the minimal set of parameters: H2O API commands that will comfortably fit on one line. That was then followed by trying to tune some of the many parameters that H2O offers, choosing the best model (based on performance on either cross-validation or a validation data set), and evaluating that on the unseen test data. Because the results are scattered throughout the book, I want to quickly bring them together here, and see what insights are to be found. As a bonus section, I show some ideas for improving results by methods other than just parameter tuning.
By the way, in the online code you will find epilogue.*.R files that contain the code to make each of the default and tuned models, as well as some code to compare them. (You will also find timing information, and
dput() output of the results, in the comments.)
This was a regression problem, and MSE was the key metric. A chart was also made for each model, using triangles to represent results that were 8% above or below the correct answer.
This data set was sensitive to how it got split. In other words, some test data splits, and some cross-validation folds, are harder to predict than others. So to fairly compare ...