Introduction to Regular Expressions in SAS

Book description

Unstructured data is the most voluminous form of data in the world, and analysts rarely receive it in perfect condition for processing. In other words, you often need to clean, transform, and enhance your source data before you can use and derive value from it—especially where textual data is concerned. In Introduction to Regular Expressions in SAS, SAS programmers of virtually all skill levels will learn how to harness the power of Regular Expressions within the SAS programming language for a wide array of everyday applications of unstructured data analyses. This book uses a practical, examples-based approach to walk you through using Regular Expressions for unstructured data processing, and provides you with the foundational information and examples to perform advanced applications. From fuzzy matching to data extraction, this book is a critical reference for any advanced analytics practitioner who needs to leverage SAS software to effectively process their data. This book is part of the SAS Press Program.

Table of contents

  1. About This Book
  2. About The Author
  3. Acknowledgments
  4. Chapter 1: Introduction
    1. 1.1 Purpose of This Book
    2. 1.2 Layout of This Book
    3. 1.3 Defining Regular Expressions
    4. 1.4 Motivational Examples
      1. 1.4.1 Extract, Transform, and Load (ETL)
      2. 1.4.2 Data Manipulation
      3. 1.4.3 Data Enrichment
  5. Chapter 2: Getting Started with Regular Expressions
    1. 2.1 Introduction
      1. 2.1.1 RegEx Test Code
    2. 2.2 Special Characters
    3. 2.3 Basic Metacharacters
      1. 2.3.1 Wildcard
      2. 2.3.2 Word
      3. 2.3.3 Non-word
      4. 2.3.4 Tab
      5. 2.3.5 Whitespace
      6. 2.3.6 Non-whitespace
      7. 2.3.7 Digit
      8. 2.3.8 Non-digit
      9. 2.3.9 Newline
      10. 2.3.10 Bell
      11. 2.3.11 Control Character
      12. 2.3.12 Octal
      13. 2.3.13 Hexadecimal
    4. 2.4 Character Classes
      1. 2.4.1 List
      2. 2.4.2 Not List
      3. 2.4.3 Range
    5. 2.5 Modifiers
      1. 2.5.1 Case Modifiers
      2. 2.5.2 Repetition Modifiers
    6. 2.6 Options
      1. 2.6.1 Ignore Case
      2. 2.6.2 Single Line
      3. 2.6.3 Multiline
      4. 2.6.4 Compile Once
      5. 2.6.5 Substitution Operator
    7. 2.7 Zero-width Metacharacters
      1. 2.7.1 Start of Line
      2. 2.7.2 End of Line
      3. 2.7.3 Word Boundary
      4. 2.7.4 Non-word Boundary
      5. 2.7.5 String Start
    8. 2.8 Summary
  6. Chapter 3: Using Regular Expressions in SAS
    1. 3.1 Introduction
      1. 3.1.1 Capture Buffer
    2. 3.2 Built-in SAS Functions
      1. 3.2.1 PRXPARSE
      2. 3.2.2 PRXMATCH
      3. 3.2.3 PRXCHANGE
      4. 3.2.4 PRXPOSN
      5. 3.2.5 PRXPAREN
    3. 3.3 Built-in SAS Call Routines
      1. 3.3.1 CALL PRXCHANGE
      2. 3.3.2 CALL PRXPOSN
      3. 3.3.3 CALL PRXSUBSTR
      4. 3.3.4 CALL PRXNEXT
      5. 3.3.5 CALL PRXDEBUG
      6. 3.3.6 CALL PRXFREE
    4. 3.4 Summary
  7. Chapter 4: Applications of Regular Expressions in SAS
    1. 4.1 Introduction
      1. 4.1.1 Random PII Generator
    2. 4.2 Data Cleansing and Standardization
    3. 4.3 Information Extraction
    4. 4.4 Search and Replacement
    5. 4.5 Summary
      1. 4.5.1 Start Small
      2. 4.5.2 Think Big
  8. Appendix A: Perl Version Notes
  9. Appendix B: ASCII Code Lookup Tables
    1. Non-Printing Characters
    2. Printing Characters
  10. Appendix C: POSIX Metacharacters
  11. Index

Product information

  • Title: Introduction to Regular Expressions in SAS
  • Author(s): CAP Matthew Windham
  • Release date: December 2014
  • Publisher(s): SAS Institute
  • ISBN: 9781629594989