Chapter 5: Create a Modeling Data Set
Creating TRAIN and TEST Data Sets
Overview
It is commonly estimated that at least 80% of a data scientist’s effort is exerted in the extract, transform, load (ETL) stage of model development. This is a critical stage of model development that is often overlooked because it is not as exciting as applying a range of awesome algorithms to your data and evaluating your model’s performance. The ETL process is critical for quality model development because of the GIGO rule: Garbage In, Garbage Out.
Nearly all data sets need to ...
Get End-to-End Data Science with SAS now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.