Book description
The key to ensuring accurate data is having clean data. This book develops and describes data cleaning programs and macros. You can use many of the programs and macros that are provided, as is, or you can modify them for your own special data cleaning tasks. Ron has carefully explained and documented each of the programs and macros, thus providing you with SAS programming instruction on an intermediate-to-advanced level. Topics presented include validation checks on character data, numeric data, missing values, and date values; searching for duplicate records; working with multiple files; double entry and verification using the COMPARE procedure; and SQL solutions and using validation data sets. Written in Ron's signature informal, tutorial style, this book gives anyone who manages data thoroughly documented, step-by-step instructions for the development of data cleaning programs and macros.
Supports releases 6.12 and higher of SAS software.
Table of contents
- Copyright
- Introduction
- Acknowledgments
- Checking Values of Character Variables
-
Checking Values of Numeric Variables
- Introduction
- Using PROC MEANS, PROC TABULATE, and PROC UNIVARIATE to Look for Outliers
- Using PROC PRINT with a WHERE Statement to List Invalid Data Values
- Using a DATA Step to Check for Invalid Values
- Creating a Macro for Range Checking
- Using Formats to Check for Invalid Values
- Using Informats to Check for Invalid Values
- Using PROC UNIVARIATE to Look for Highest and Lowest Values by Percentage
- Using PROC RANK to Look for Highest and Lowest Values by Percentage
- Extending PROC RANK to Look for Highest and Lowest “n” Values
- Finding Another Way to Determine Highest and Lowest Values
- Checking a Range Using an Algorithm Based on Standard Deviation
- Macros Based on the Two Methods of Outlier Detection
- Checking a Range Based on the Interquartile Range
- Checking Ranges for Several Variables
-
Checking for Missing Values
- Introduction
- Inspecting the SAS Log
- Using PROC MEANS and PROC FREQ to Count Missing Values
- Using DATA Step Approaches to Identify and Count Missing Values
- Using PROC TABULATE to Count Missing and Nonmissing Values for Numeric Variables
- Using PROC TABULATE to Count Missing and Nonmissing Values for Character Variables
- Creating a General Purpose Macro to Count Missing and Nonmissing Values for Both Numeric and Character Variables
- Searching for a Specific Numeric Value
- Working with Dates
-
Looking for Duplicates and “n” Observations per Subject
- Introduction
- Eliminating Duplicates by Using PROC SORT
- Detecting Duplicates by Using DATA Step Approaches
- Using PROC FREQ to Detect Duplicate ID’s
- Selecting Patients with Duplicate Observations by Using a Macro List and SQL
- Identifying Subjects with “n” Observations Each (DATA Step Approach)
- Identifying Subjects with “n” Observations Each (Using PROC FREQ)
- Working with Multiple Files
- Double Entry and Verification (PROC COMPARE)
-
Some SQL Solutions to Data Cleaning
- Introduction
- A Quick Review of PROC SQL
- Checking for Invalid Character Values
- Checking for Outliers
- Checking a Range Using an Algorithm Based on the Standard Deviation
- Checking for Missing Values
- Range Checking for Dates
- Checking for Duplicates
- Identifying Subjects with “n” Observations Each
- Checking for an ID in Each of Two Files
- More Complicated Multi-File Rules
-
Using Validation Data Sets
- Introduction
- A Simple Example of a Validation Data Set
- Making the Program More Flexible and Converting It to a Macro
- Validating Character Data
- Converting Program 9-7 into a General Purpose Macro
- Extending the Validation Macro to Include Valid Character Ranges
- Combining Numeric and Character Validity Checks in a Single Macro with a Single Validation Data Set
- Introducing SAS Integrity Constraints (Versions 7 and Later)
-
Listing of Raw Data Files and SAS Programs
- Description of the Raw Data File PATIENTS.TXT
- Layout for the Data File PATIENTS.TXT
- Listing of Raw Data File PATIENTS.TXT
- Program to Create the SAS Data Set PATIENTS
- Listing of Raw Data File PATIENTS2.TXT
- Program to Create the SAS Data Set PATIENTS2
- Program to Create the SAS Data Set AE (Adverse Events)
- Program to Create the SAS Data Set LAB_TEST
- Books Available from SAS® Press
- Index
Product information
- Title: Cody’s Data Cleaning Techniques Using SAS® Software
- Author(s):
- Release date: November 1999
- Publisher(s): SAS Institute
- ISBN: 9781580256001
You might also like
book
Cody's Data Cleaning Techniques Using SAS, Second Edition
Thoroughly updated for SAS 9, Cody's Data Cleaning Techniques Using SAS, Second Edition, addresses tasks that …
book
Smart Data Discovery Using SAS Viya
Whether you are an executive, departmental decision maker, or analyst, the need to leverage data and …
book
Simulating Data with SAS
Data simulation is a fundamental technique in statistical programming and research. Rick Wicklin's Simulating Data with …
book
Longitudinal Data and SAS
Working with longitudinal data introduces a unique set of challenges. Once you've mastered the art of …