With the encoded training and testing set ready, we can now train our classification model. We use logistic regression as an example, but there are many other classification models supported in PySpark, such as decision tree classifiers, random forests, neural networks (which we will be studying in Chapter 9, Stock Price Prediction with Regression Algorithms), linear SVM, and Naïve Bayes. For further details, please refer to the following link: https://spark.apache.org/docs/latest/ml-classification-regression.html#classification.
We train and test a logistic regression model by the following steps:
- We first import the logistic regression module and initialize a model:
>>> from pyspark.ml.classification ...