November 2015
Intermediate to advanced
190 pages
4h 11m
English
Now before we cheat and look at our answer key, let's see how well this solution does at predicting data it hasn't seen. To do this, I write the following fairly large test:
def final_model_cross_validation_test(): df = pandas.read_csv('./generated_data.csv') df['predicted_dependent_var'] = 25.6266 \ + 2.7083*df['ind_var_a'] \ - 1.5527*df['ind_var_b'] \ - 0.3917*df['ind_var_c'] \ - 0.2006*df['ind_var_e'] \ + 5.6450*df['ind_var_b'] * df['ind_var_c'] df['diff'] = (df['dependent_var'] - df['predicted_dependent_var']).abs() print df['diff'] print '===========' cv_df = pandas.read_csv('./generated_data_cv.csv') cv_df['predicted_dependent_var'] = 25.6266 \ + 2.7083*cv_df['ind_var_a'] \ - 1.5527*cv_df['ind_var_b'] \ - 0.3917*cv_df['ind_var_c'] ...Read now
Unlock full access