This chapter focuses on building Random Forests (RF) with PySpark for classification. We will learn about various aspects of them and how the predictions take place; but before knowing more about random forests, we have to learn the building block of RF that is a decision tree (DT). A decision tree is also used for Classification/Regression. but in terms of accuracy, random forests beat DT classifiers due to various reasons that we will cover later in the chapter. Let’s learn more about decision trees.
Decision Tree
A decision tree falls under the supervised category of machine learning ...