This chapter will focus on building random forests (RFs) with PySpark for classification. It would also include hyperparameter tuning to find the best set of parameters for the model. We will learn about various aspects of ensembling and how predictions take place, but before knowing more about random forests, we must cover the building block of random forests, which is a decision tree. A decision tree can also be used for classification/regression, but in terms of accuracy, random forests do a better ...
6. Random Forests Using PySpark
Get Machine Learning with PySpark: With Natural Language Processing and Recommender Systems now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.