We'll code up the strategy as follows (Please refer to the Credit default prediction.ipynb file in GitHub while implementing the code):
- Import the relevant packages and the dataset:
import pandas as pddata = pd.read_csv('...') # Please add path to the file you downloaded
The first three rows of the dataset we downloaded are as follows:
The preceding screenshot is a subset of variables in the original dataset. The variable named Defaultin2yrs is the output variable that we need to predict, based on the rest of the variables present in the dataset.
- Summarize the dataset to understand the variables better:
data.describe() ...