Looking at the preceding sample of the loan data, we can see that it is not exactly in the form that we need for our classification. Specifically, we need to do the following:
- Remove non-numerical characters from the interest rate and FICO score columns.
- Encode our interest rate into two classes for a given interest rate threshold. We will use 1.0 to represent our first class (yes, we can get the loan with that interest rate) and 0.0 to represent our second class (no, we cannot get the loan with that interest rate).
- Select a single value for the FICO credit score. We are given a range of credit scores, but we need a single value. The average, minimum, or maximum score are natural choices and, in our example, ...