Chapter 5: Extract, Transform, Load

5.1 Introduction. 85

5.2 Examining Data. 85

5.2.1 PROC CONTENTS. 86

5.2.2 PROC FREQ.. 87

5.2.3 PROC MEANS. 88

5.3 Encoding Translation. 89

5.4 Conversion. 92

5.4.1 Hexadecimal to Decimal 92

5.4.2 Working with Dates. 92

5.5 Standardization. 94

5.6 Binning. 95

5.6.1 Quantile Binning. 95

5.6.2 Bucket Binning. 97

5.7 Summary. 98

5.1 Introduction

Extract, Transform, and Load (ETL) is the process by which all source data is manipulated for downstream use in storage and analysis systems. The source can be raw data streams, flat files, staging tables, or production database tables. This stage of work is critical to ensuring that clean, useable data is entering the analytical phase of our process. I have attempted ...

Get Unstructured Data Analysis now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.