May 2019
Beginner
170 pages
4h 9m
English
A machine learning algorithm such as random forest can handle a few missing values very well, and in some cases we can adopt strategies such as imputing or removing rows with missing values. But if the proportion of missing values in a column is pretty high, we might need to remove entire columns. The following lines of code help us determine the percentage of missing values in each column of the data:
na_counts = pd.DataFrame(df.isna().sum()/len(df)) na_counts.columns = ["null_row_pct"]na_counts[na_counts.null_row_pct > 0].sort_values(by = "null_row_pct", ascending=False)
The resulting DataFrame looks as follows:
At first glance, we might be inclined to remove all rows that have missing latitude or longitude values ...
Read now
Unlock full access