Skip to Main Content
End-to-End Data Science with SAS
book

End-to-End Data Science with SAS

by James Gearheart
June 2020
Beginner to intermediate content levelBeginner to intermediate
380 pages
11h 32m
English
SAS Institute
Content preview from End-to-End Data Science with SAS

Chapter 5: Create a Modeling Data Set

Overview

ETL

Extract

APIs

Web Scraping

Open Source Data Sets

Data Set

Get the Data

Reduce the Size of the Data

Create a Target Variable

Creating TRAIN and TEST Data Sets

Variable Selection

Transform

Load

Chapter Review

Overview

It is commonly estimated that at least 80% of a data scientist’s effort is exerted in the extract, transform, load (ETL) stage of model development. This is a critical stage of model development that is often overlooked because it is not as exciting as applying a range of awesome algorithms to your data and evaluating your model’s performance. The ETL process is critical for quality model development because of the GIGO rule: Garbage In, Garbage Out.

Nearly all data sets need to ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Tree-Based Machine Learning Methods in SAS Viya

Tree-Based Machine Learning Methods in SAS Viya

Dr. Sharad Saxena
Big Data Analytics with SAS

Big Data Analytics with SAS

David Pope, Subhashini S Tripathi

Publisher Resources

ISBN: 9781642958065