June 2020
Beginner to intermediate
380 pages
11h 32m
English
Chapter 5: Create a Modeling Data Set
Creating TRAIN and TEST Data Sets
Overview
It is commonly estimated that at least 80% of a data scientist’s effort is exerted in the extract, transform, load (ETL) stage of model development. This is a critical stage of model development that is often overlooked because it is not as exciting as applying a range of awesome algorithms to your data and evaluating your model’s performance. The ETL process is critical for quality model development because of the GIGO rule: Garbage In, Garbage Out.
Nearly all data sets need to ...