Let's begin by loading pandas and the dataset:
- First, we'll import the pandas library:
import pandas as pd
- Let's load the Credit Approval Data Set:
data = pd.read_csv('creditApprovalUCI.csv')
- Let's calculate the percentage of missing values for each variable and sort them in ascending order:
data.isnull().mean().sort_values(ascending=True)
The output of the preceding code is as follows:
A11 0.000000 A12 0.000000 A13 0.000000 A15 0.000000 A16 0.000000 A4 0.008696 A5 0.008696 A6 0.013043 A7 0.013043 A1 0.017391 A2 0.017391 A14 0.018841 A3 0.133333 A8 0.133333 A9 0.133333 A10 0.133333 dtype: float64
- Now, we'll remove the observations with missing data in any of the variables:
data_cca = data.dropna()