Book description
Thoroughly updated for SAS 9, Cody's Data Cleaning Techniques Using SAS, Second Edition, addresses tasks that nearly every SAS programmer needs to do - that is, make sure that data errors are located and corrected. Written in Ron Cody's signature informal, tutorial style, this book develops and demonstrates data cleaning programs and macros that you can use as written or modify for your own special data cleaning needs. Each topic is developed through specific examples, and every program and macro is explained in detail.
Table of contents
- List of Programs
- Preface
- Acknowledgments
-
1 Checking Values of Character Variables
- Introduction
- Using PROC FREQ to List Values
- Description of the Raw Data File PATIENTS.TXT
- Using a DATA Step to Check for Invalid Values
- Describing the VERIFY, TRIM, MISSING, and NOTDIGIT Functions
- Using PROC PRINT with a WHERE Statement to List Invalid Values
- Using Formats to Check for Invalid Values
- Using Informats to Remove Invalid Values
-
2 Checking Values of Numeric Variables
- Introduction
- Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look for Outliers
- Using an ODS SELECT Statement to List Extreme Values
- Using PROC UNIVARIATE Options to List More Extreme Observations
- Using PROC UNIVARIATE to Look for Highest and Lowest Values by Percentage
- Using PROC RANK to Look for Highest and Lowest Values by Percentage
- Presenting a Program to List the Highest and Lowest Ten Values
- Presenting a Macro to List the Highest and Lowest “n” Values
- Using PROC PRINT with a WHERE Statement to List Invalid Data Values
- Using a DATA Step to Check for Out-of-Range Values
- Identifying Invalid Values versus Missing Values
- Listing Invalid (Character) Values in the Error Report
- Creating a Macro for Range Checking
- Checking Ranges for Several Variables
- Using Formats to Check for Invalid Values
- Using Informats to Filter Invalid Values
- Checking a Range Using an Algorithm Based on Standard Deviation
- Detecting Outliers Based on a Trimmed Mean and Standard Deviation
- Presenting a Macro Based on Trimmed Statistics
- Using the TRIM Option of PROC UNIVARIATE and ODS to Compute Trimmed Statistics
- Checking a Range Based on the Interquartile Range
- 3 Checking for Missing Values
- 4 Working with Dates
-
5 Looking for Duplicates and “n” Observations per Subject
- Introduction
- Eliminating Duplicates by Using PROC SORT
- Detecting Duplicates by Using DATA Step Approaches
- Using PROC FREQ to Detect Duplicate ID’s
- Selecting Patients with Duplicate Observations by Using a Macro List and SQL
- Identifying Subjects with “n” Observations Each (DATA Step Approach)
- Identifying Subjects with “n” Observations Each (Using PROC FREQ)
- 6 Working with Multiple Files
- 7 Double Entry and Verification (PROC COMPARE)
-
8 Some PROC SQL Solutions to Data Cleaning
- Introduction
- A Quick Review of PROC SQL
- Checking for Invalid Character Values
- Checking for Outliers
- Checking a Range Using an Algorithm Based on the Standard Deviation
- Checking for Missing Values
- Range Checking for Dates
- Checking for Duplicates
- Identifying Subjects with “n” Observations Each
- Checking for an ID in Each of Two Files
- More Complicated Multi-File Rules
- 9 Correcting Errors
-
10 Creating Integrity Constraints and Audit Trails
- Introducing SAS Integrity Constraints
- Demonstrating General Integrity Constraints
- Deleting an Integrity Constraint Using PROC DATASETS
- Creating an Audit Trail Data Set
- Demonstrating an Integrity Constraint Involving More than One Variable
- Demonstrating a Referential Constraint
- Attempting to Delete a Primary Key When a Foreign Key Still Exists
- Attempting to Add a Name to the Child Data Set
- Demonstrating the Cascade Feature of a Referential Constraint
- Demonstrating the SET NULL Feature of a Referential Constraint
- Demonstrating How to Delete a Referential Constraint
- 11 DataFlux and dfPower Studio
-
Appendix Listing of Raw Data Files and SAS Programs
- Programs and Raw Data Files Used in This Book
- Description of the Raw Data File PATIENTS.TXT
- Layout for the Data File PATIENTS.TXT
- Listing of Raw Data File PATIENTS.TXT
- Program to Create the SAS Data Set PATIENTS
- Listing of Raw Data File PATIENTS2.TXT
- Program to Create the SAS Data Set PATIENTS2
- Program to Create the SAS Data Set AE (Adverse Events)
- Program to Create the SAS Data Set LAB_TEST
- Listings of the Data Cleaning Macros Used in This Book
- Index
Product information
- Title: Cody's Data Cleaning Techniques Using SAS, Second Edition
- Author(s):
- Release date: April 2015
- Publisher(s): SAS Institute
- ISBN: 9781599948324
You might also like
book
Handbook of SAS DATA Step Programming
This handbook shows readers how best to manage and manipulate data by using the DATA step …
book
Implementing CDISC Using SAS, 2nd Edition
For decades researchers and programmers have used SAS to analyze, summarize, and report clinical trial data. …
book
Cody’s Data Cleaning Techniques Using SAS® Software
The key to ensuring accurate data is having clean data. This book develops and describes data …
book
SAS Programming with Medicare Administrative Data, 2nd Edition
SAS Programming with Medicare Administrative Data is the most comprehensive resource available for using Medicare data …