Mastering SAS Programming for Data Warehousing

Book description

Build a strong foundation in SAS data warehousing by understanding data transformation code and policy, data stewardship and management, interconnectivity between SAS and other warehousing products, and print and web reporting

Key Features

  • Understand how to use SAS macros for standardizing extract, transform, and load (ETL) protocols
  • Develop and use data curation files for effective warehouse management
  • Learn how to develop and manage ETL, policies, and print and web reports that meet user needs

Book Description

SAS is used for various functions in the development and maintenance of data warehouses, thanks to its reputation of being able to handle 'big data'.

This book will help you learn the pros and cons of storing data in SAS. As you progress, you'll understand how to document and design extract-transform-load (ETL) protocols for SAS processes. Later, you'll focus on how the use of SAS arrays and macros can help standardize ETL. The book will also help you examine approaches for serving up data using SAS and explore how connecting SAS to other systems can enhance the data warehouse user's experience.

By the end of this data management book, you will have a fundamental understanding of the roles SAS can play in a warehouse environment, and be able to choose wisely when designing your data warehousing processes involving SAS.

What you will learn

  • Develop efficient ways to manage data input/output (I/O) in SAS
  • Create and manage extract, transform, and load (ETL) code in SAS
  • Standardize ETL through macro variables, macros, and arrays
  • Identify data warehouse users and ensure their needs are met
  • Design crosswalk and other variables to serve analyst needs
  • Maintain data curation files to improve communication and management
  • Use the output delivery system (ODS) for print and web reporting
  • Connect other products to SAS to optimize storage and reporting

Who this book is for

This book is for data architects, managers leading data projects, and programmers or developers using SAS who want to effectively maintain a data lake, data mart, or data warehouse.

Table of contents

  1. Mastering SAS Programming for Data Warehousing
  2. Why subscribe?
  3. Contributors
  4. About the author
  5. About the reviewer
  6. Packt is searching for authors like you
  7. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
    4. Download the example code files
    5. Download the color images
    6. Conventions used
    7. Get in touch
    8. Reviews
  8. Section 1: Managing Data in a SAS Data Warehouse
  9. Chapter 1: Using SAS in a Data Mart, Data Lake, or Data Warehouse
    1. Technical requirements
    2. Using original versions of SAS
      1. Initial SAS data handling
      2. Early SAS data handling
      3. SAS data handling improvements
    3. Accessing data in SAS
      1. Upgrading to mainframes
      2. Transitioning to personal computers
      3. Reading external files
    4. Improving I/O
      1. Developing warehouse environments
      2. Using the WHERE clause
      3. Using IF compared to WHERE
      4. Sorting in SAS
      5. Setting indexes on variables
    5. Dealing with storage and memory issues
      1. Avoiding memory issues
      2. Accommodating Structured Query Language
      3. Using PROC SQL
      4. Using SAS today in a warehouse environment
      5. Using SAS in the cloud
    6. Using SAS in modern warehousing
      1. Warehousing unstructured text
      2. Using SAS components for warehousing
      3. Using other applications with SAS
      4. Connecting to Snowflake
    7. Summary
    8. Questions
    9. Further reading
  10. Chapter 2: Reading Big Data into SAS
    1. Technical requirements
    2. Reading data extracts into SAS
      1. Understanding SAS datasets
      2. Working with the WORK directory
      3. Specifying LIBNAME
      4. Reading in SAS datasets
    3. Using the SAS XPT format
      1. Storing data in XPT format
      2. Creating an XPT file
      3. Comparing PROC CPORT/CIMPORT to PROC COPY
      4. Reading in XPTs using the XPORT engine
    4. Working with other file formats
      1. Reading non-SAS data formats
      2. Using PROC IMPORT
      3. Converting non-SAS data to SAS format
    5. Dealing with difficult data
      1. Understanding legacy data
      2. Reading data with difficult formats
      3. Specifying data locations in a fixed-width file
      4. Troubleshooting reading data after transport
    6. Summary
    7. Questions
    8. Further reading
  11. Chapter 3: Helpful PROCs for Managing Data
    1. Technical requirements
    2. PROCs for understanding data
      1. Using PROC CONTENTS to understand data
      2. Documenting SAS data with codebooks
      3. Using labels for variables
      4. Adding user-defined formats to categorical variables
      5. Using native SAS formats with numeric variables
      6. Applying user-defined formats to continuous variables
    3. Using labels and formats in processing
      1. Using PROCs with labels and formats
      2. Maintaining labels and formats
      3. Alternatives to using labels and formats in a warehouse setting
    4. Viewing data in SAS
      1. Using PROC PRINT to view data
      2. Using PROC SQL to view data
      3. Using arithmetic operators in SAS
      4. Viewing data through SAS windows
    5. Summary
    6. Questions
    7. Further reading
  12. Chapter 4: Managing ETL in SAS
    1. Technical requirements
    2. Setting up an analytic environment
      1. Designating storage and user groups
      2. Managing documentation storage
      3. Setting naming conventions for datasets
    3. Planning for data transformation
      1. Understanding arrays in SAS
      2. Setting naming conventions for variables
      3. Setting naming conventions and style for code
    4. Developing policy
      1. Setting format and label policies
      2. Setting data transfer policies
      3. Setting other policies
    5. Summary
    6. Questions
    7. Further reading
  13. Chapter 5: Managing Data Reporting in SAS
    1. Technical requirements
    2. Using the ODS for data files
      1. Identifying available tables in the ODS
      2. Identifying internal tables in the log
      3. Outputting internal tables using the ODS
    3. Using the ODS for graphics files
      1. Outputting graphics from analytic PROCs
      2. Outputting graphics in different formats
      3. Setting system options
    4. SAS PROCs designed for reporting
      1. Using PROC REPORT
      2. Understanding the basics of PROC TABULATE
      3. Preparing data for PROC TABULATE
      4. Formulating PROC TABULATE code
      5. Using PROC SGPLOT
      6. Using PROC SGPANEL and PROC SGSCATTER
      7. Using PROC TEMPLATE with PROC SGRENDER
    5. Summary
    6. Questions
    7. Further reading
  14. Section 2: Using SAS for Extract-Transform-Load (ETL) Protocols in a Data Warehouse
  15. Chapter 6: Standardizing Coding Using SAS Arrays
    1. Technical requirements
    2. Understanding examples of arrays used to create variables
      1. Scenarios where arrays are useful
      2. Arrays as temporary objects
      3. Using arrays to create variables
    3. Conditions and index variables in array processing
      1. Adding a condition to array processing
      2. Creating index variables from array outputs
      3. Documenting and standardizing array processing
    4. Limitations of arrays
      1. Naming limitations in SAS arrays
      2. Naming limitations arrays impose on data storage
      3. Difficulty in troubleshooting
    5. Summary
    6. Questions
    7. Further reading
  16. Chapter 7: Designing and Developing ETL Code in SAS
    1. Technical requirements
    2. Planning the ETL approach
      1. Specifying data with a data dictionary
      2. Understanding default PROC FREQ
      3. Using options to manipulate PROC FREQ output
      4. Using PROC UNIVARIATE for troubleshooting
      5. Using PROC FREQ to troubleshoot continuous variables
      6. Making plots for troubleshooting
      7. Choosing variables to serve to users
      8. Creating and maintaining formats for variables
    3. Creating transformation code
      1. Designing categorical grouping variables
      2. Cleaning up continuous variables
      3. Designing indicator variables
      4. Considering dates and numerical variables
      5. Exporting the transformed dataset
    4. Summary
    5. Questions
    6. Further reading
  17. Chapter 8: Using Macros to Automate ETL in SAS
    1. Technical requirements
    2. Creating macros out of data step code
      1. Choosing to use macros and macro variables
      2. Using macro variables with the %LET command
      3. Using the log file with macro variables and macros
      4. Making macros with PROCs
      5. Making macros with data steps
      6. Addition conditions to macros
    3. Storing and calling macros
      1. Storing and calling macros in the same code
      2. Storing macros separately and calling them from code
    4. Loading transformed data
    5. Summary
    6. Questions
    7. Further reading
  18. Chapter 9: Debugging and Troubleshooting in SAS
    1. Technical requirements
    2. Debugging data step code
      1. Writing well-formed and well-formatted code
      2. Using log information as guidance
      3. Troubleshooting strategies for data steps
    3. Debugging the do loop code
      1. Using the original data step debugger
      2. Using the data step debugger in SAS Enterprise Guide
    4. Debugging SAS macros
      1. Avoiding errors through the design process
      2. Using %PUT to display values of macro variables
      3. Setting system options to help with debugging macros
    5. Summary
    6. Questions
    7. Further reading
  19. Section 3: Using SAS When Serving Warehouse Data to Users
  20. Chapter 10: Considering the User Needs of SAS Data Warehouses
    1. Technical requirements
    2. Needs of data warehouse users
      1. Considering classes of data warehouse users
      2. Considering the needs of each class of users
    3. Data stewardship for serving warehouse users
      1. Providing data access
      2. Serving needs created through the warehouse structure
      3. Adding, using, and serving up foreign keys
      4. Crosswalking data over time
    4. Data stewardship for serving warehouse developers
      1. Managing a data stewardship committee
      2. Providing curation and other support
    5. Summary
    6. Questions
    7. Further reading
  21. Chapter 11: Connecting the SAS Data Warehouse to Other Systems
    1. Technical requirements
    2. Serving SAS to other systems
      1. Implementing de-identification policies
      2. Serving up a star schema
    3. Connecting to non-SAS data storage
      1. Understanding SQL views
      2. Using SAS to copy data from a remote data system
      3. Leveraging PROC SQL views for data transfer
      4. Exporting SAS data to non-SAS data storage
    4. Innovations in integrating SAS in reporting functions
    5. Summary
    6. Questions
    7. Further reading
  22. Chapter 12: Using the ODS for Visualization in SAS
    1. Technical requirements
    2. The basics of using the ODS for data visualization
      1. Using macros in reporting
      2. Connecting to data in Snowflake
    3. Serving SAS data to the web with the ODS
      1. Interacting with SAS data over the web
      2. Using the SAS Enterprise Guide
      3. Using SAS Viya
    4. Using SAS and R for visualizations
      1. Reporting SAS data in Tableau
    5. Considerations when reporting SAS warehouse data
    6. Summary
    7. Questions
    8. Further reading
  23. Assessments
    1. Chapter 1
    2. Chapter 2
    3. Chapter 3
    4. Chapter 4
    5. Chapter 5
    6. Chapter 6
    7. Chapter 7
    8. Chapter 8
    9. Chapter 9
    10. Chapter 10
    11. Chapter 11
    12. Chapter 12
  24. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think

Product information

  • Title: Mastering SAS Programming for Data Warehousing
  • Author(s): Monika Wahi
  • Release date: October 2020
  • Publisher(s): Packt Publishing
  • ISBN: 9781789532371