This is the phase where you consider your options for modeling the final dataset that you have created in the previous phases.
In this phase, you are typically trying to address the following items:
- Determining the type of ML problem, such as supervised, semi-supervised, unsupervised, and reinforcement learning.
- Shortlisting ML models which would fit the bill.
- Agreeing on evaluation metrics and paying attention to important points, such as class imbalance as it tricks metrics such as accuracy. If the dataset is imbalanced, you can refer to sampling techniques to obtain a balanced dataset.
- Identifying the level of tolerance for false negatives and false positives.
- Thinking about how you would properly set up the cross-validation. ...