O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Effective awk Programming, 4th Edition

Book Description

This practical guide serves as both a reference and tutorial for POSIX-standard awk and for the GNU implementation, called gawk. This book is useful for novices and awk experts alike. In this thoroughly revised edition, author and gawk lead developer Arnold Robbins describes the awk language and gawk program in detail, shows you how to use awk and gawk for problem solving, and then dives into specific features of gawk.

Table of Contents

  1. Dedication
  2. Foreword to the Third Edition
  3. Foreword to the Fourth Edition
  4. Preface
    1. History of awk and gawk
    2. A Rose by Any Other Name
    3. Using This Book
    4. Typographical Conventions
      1. Dark Corners
    5. The GNU Project and This Book
    6. How to Stay Current
    7. Using Code Examples
    8. Safari® Books Online
    9. How to Contact Us
    10. Acknowledgments
  5. I. The awk Language
    1. 1. Getting Started with awk
      1. How to Run awk Programs
        1. One-Shot Throwaway awk Programs
        2. Running awk Without Input Files
        3. Running Long Programs
        4. Executable awk Programs
        5. Comments in awk Programs
        6. Shell Quoting Issues
          1. Quoting in MS-Windows batch files
      2. Datafiles for the Examples
      3. Some Simple Examples
      4. An Example with Two Rules
      5. A More Complex Example
      6. awk Statements Versus Lines
      7. Other Features of awk
      8. When to Use awk
      9. Summary
    2. 2. Running awk and gawk
      1. Invoking awk
      2. Command-Line Options
      3. Other Command-Line Arguments
      4. Naming Standard Input
      5. The Environment Variables gawk Uses
        1. The AWKPATH Environment Variable
        2. The AWKLIBPATH Environment Variable
        3. Other Environment Variables
      6. gawk’s Exit Status
      7. Including Other Files into Your Program
      8. Loading Dynamic Extensions into Your Program
      9. Obsolete Options and/or Features
      10. Undocumented Options and Features
      11. Summary
    3. 3. Regular Expressions
      1. How to Use Regular Expressions
      2. Escape Sequences
      3. Regular Expression Operators
      4. Using Bracket Expressions
      5. How Much Text Matches?
      6. Using Dynamic Regexps
      7. gawk-Specific Regexp Operators
      8. Case Sensitivity in Matching
      9. Summary
    4. 4. Reading Input Files
      1. How Input Is Split into Records
        1. Record Splitting with Standard awk
        2. Record Splitting with gawk
      2. Examining Fields
      3. Nonconstant Field Numbers
      4. Changing the Contents of a Field
      5. Specifying How Fields Are Separated
        1. Whitespace Normally Separates Fields
        2. Using Regular Expressions to Separate Fields
        3. Making Each Character a Separate Field
        4. Setting FS from the Command Line
        5. Making the Full Line Be a Single Field
        6. Field-Splitting Summary
      6. Reading Fixed-Width Data
      7. Defining Fields by Content
      8. Multiple-Line Records
      9. Explicit Input with getline
        1. Using getline with No Arguments
        2. Using getline into a Variable
        3. Using getline from a File
        4. Using getline into a Variable from a File
        5. Using getline from a Pipe
        6. Using getline into a Variable from a Pipe
        7. Using getline from a Coprocess
        8. Using getline into a Variable from a Coprocess
        9. Points to Remember About getline
        10. Summary of getline Variants
      10. Reading Input with a Timeout
      11. Directories on the Command Line
      12. Summary
    5. 5. Printing Output
      1. The print Statement
      2. print Statement Examples
      3. Output Separators
      4. Controlling Numeric Output with print
      5. Using printf Statements for Fancier Printing
        1. Introduction to the printf Statement
        2. Format-Control Letters
        3. Modifiers for printf Formats
        4. Examples Using printf
      6. Redirecting Output of print and printf
      7. Special Files for Standard Preopened Data Streams
      8. Special Filenames in gawk
        1. Accessing Other Open Files with gawk
        2. Special Files for Network Communications
        3. Special Filename Caveats
      9. Closing Input and Output Redirections
      10. Summary
    6. 6. Expressions
      1. Constants, Variables, and Conversions
        1. Constant Expressions
          1. Numeric and string constants
          2. Octal and hexadecimal numbers
          3. Regular expression constants
        2. Using Regular Expression Constants
        3. Variables
          1. Using variables in a program
          2. Assigning variables on the command line
        4. Conversion of Strings and Numbers
          1. How awk converts between strings and numbers
          2. Locales can influence conversion
      2. Operators: Doing Something with Values
        1. Arithmetic Operators
        2. String Concatenation
        3. Assignment Expressions
        4. Increment and Decrement Operators
      3. Truth Values and Conditions
        1. True and False in awk
        2. Variable Typing and Comparison Expressions
          1. String type versus numeric type
          2. Comparison operators
          3. String comparison with POSIX rules
        3. Boolean Expressions
        4. Conditional Expressions
      4. Function Calls
      5. Operator Precedence (How Operators Nest)
      6. Where You Are Makes a Difference
      7. Summary
    7. 7. Patterns, Actions, and Variables
      1. Pattern Elements
        1. Regular Expressions as Patterns
        2. Expressions as Patterns
        3. Specifying Record Ranges with Patterns
        4. The BEGIN and END Special Patterns
          1. Startup and cleanup actions
          2. Input/output from BEGIN and END rules
        5. The BEGINFILE and ENDFILE Special Patterns
        6. The Empty Pattern
      2. Using Shell Variables in Programs
      3. Actions
      4. Control Statements in Actions
        1. The if-else Statement
        2. The while Statement
        3. The do-while Statement
        4. The for Statement
        5. The switch Statement
        6. The break Statement
        7. The continue Statement
        8. The next Statement
        9. The nextfile Statement
        10. The exit Statement
      5. Predefined Variables
        1. Built-in Variables That Control awk
        2. Built-in Variables That Convey Information
        3. Using ARGC and ARGV
      6. Summary
    8. 8. Arrays in awk
      1. The Basics of Arrays
        1. Introduction to Arrays
        2. Referring to an Array Element
        3. Assigning Array Elements
        4. Basic Array Example
        5. Scanning All Elements of an Array
        6. Using Predefined Array Scanning Orders with gawk
      2. Using Numbers to Subscript Arrays
      3. Using Uninitialized Variables as Subscripts
      4. The delete Statement
      5. Multidimensional Arrays
        1. Scanning Multidimensional Arrays
      6. Arrays of Arrays
      7. Summary
    9. 9. Functions
      1. Built-in Functions
        1. Calling Built-in Functions
        2. Numeric Functions
        3. String-Manipulation Functions
          1. More about ‘\’ and ‘&’ with sub(), gsub(), and gensub()
        4. Input/Output Functions
        5. Time Functions
        6. Bit-Manipulation Functions
        7. Getting Type Information
        8. String-Translation Functions
      2. User-Defined Functions
        1. Function Definition Syntax
        2. Function Definition Examples
        3. Calling User-Defined Functions
          1. Writing a function call
          2. Controlling variable scope
          3. Passing function arguments by value or by reference
        4. The return Statement
        5. Functions and Their Effects on Variable Typing
      3. Indirect Function Calls
      4. Summary
  6. II. Problem Solving with awk
    1. 10. A Library of awk Functions
      1. Naming Library Function Global Variables
      2. General Programming
        1. Converting Strings to Numbers
        2. Assertions
        3. Rounding Numbers
        4. The Cliff Random Number Generator
        5. Translating Between Characters and Numbers
        6. Merging an Array into a String
        7. Managing the Time of Day
        8. Reading a Whole File at Once
        9. Quoting Strings to Pass to the Shell
      3. Datafile Management
        1. Noting Datafile Boundaries
        2. Rereading the Current File
        3. Checking for Readable Datafiles
        4. Checking for Zero-Length Files
        5. Treating Assignments as Filenames
      4. Processing Command-Line Options
      5. Reading the User Database
      6. Reading the Group Database
      7. Traversing Arrays of Arrays
      8. Summary
    2. 11. Practical awk Programs
      1. Running the Example Programs
      2. Reinventing Wheels for Fun and Profit
        1. Cutting Out Fields and Columns
        2. Searching for Regular Expressions in Files
        3. Printing Out User Information
        4. Splitting a Large File into Pieces
        5. Duplicating Output into Multiple Files
        6. Printing Nonduplicated Lines of Text
        7. Counting Things
      3. A Grab Bag of awk Programs
        1. Finding Duplicated Words in a Document
        2. An Alarm Clock Program
        3. Transliterating Characters
        4. Printing Mailing Labels
        5. Generating Word-Usage Counts
        6. Removing Duplicates from Unsorted Text
        7. Extracting Programs from Texinfo Source Files
        8. A Simple Stream Editor
        9. An Easy Way to Use Library Functions
        10. Finding Anagrams from a Dictionary
        11. And Now for Something Completely Different
      4. Summary
  7. III. Moving Beyond Standard awk with gawk
    1. 12. Advanced Features of gawk
      1. Allowing Nondecimal Input Data
      2. Controlling Array Traversal and Array Sorting
        1. Controlling Array Traversal
        2. Sorting Array Values and Indices with gawk
      3. Two-Way Communications with Another Process
      4. Using gawk for Network Programming
      5. Profiling Your awk Programs
      6. Summary
    2. 13. Internationalization with gawk
      1. Internationalization and Localization
      2. GNU gettext
      3. Internationalizing awk Programs
      4. Translating awk Programs
        1. Extracting Marked Strings
        2. Rearranging printf Arguments
        3. awk Portability Issues
      5. A Simple Internationalization Example
      6. gawk Can Speak Your Language
      7. Summary
    3. 14. Debugging awk Programs
      1. Introduction to the gawk Debugger
        1. Debugging in General
        2. Debugging Concepts
        3. awk Debugging
      2. Sample gawk Debugging Session
        1. How to Start the Debugger
        2. Finding the Bug
      3. Main Debugger Commands
        1. Control of Breakpoints
        2. Control of Execution
        3. Viewing and Changing Data
        4. Working with the Stack
        5. Obtaining Information About the Program and the Debugger State
        6. Miscellaneous Commands
      4. Readline Support
      5. Limitations
      6. Summary
    4. 15. Arithmetic and Arbitrary-Precision Arithmetic with gawk
      1. A General Description of Computer Arithmetic
      2. Other Stuff to Know
      3. Arbitrary-Precision Arithmetic Features in gawk
      4. Floating-Point Arithmetic: Caveat Emptor!
        1. Floating-Point Arithmetic Is Not Exact
          1. Many numbers cannot be represented exactly
          2. Be careful comparing values
          3. Errors accumulate
        2. Getting the Accuracy You Need
        3. Try a Few Extra Bits of Precision and Rounding
        4. Setting the Precision
        5. Setting the Rounding Mode
      5. Arbitrary-Precision Integer Arithmetic with gawk
      6. Standards Versus Existing Practice
      7. Summary
    5. 16. Writing Extensions for gawk
      1. Introduction
      2. Extension Licensing
      3. How It Works at a High Level
      4. API Description
        1. Introduction
        2. General-Purpose Data Types
        3. Memory Allocation Functions and Convenience Macros
        4. Constructor Functions
        5. Registration Functions
          1. Registering an extension function
          2. Registering an exit callback function
          3. Registering an extension version string
          4. Customized input parsers
          5. Customized output wrappers
          6. Customized two-way processors
        6. Printing Messages
        7. Updating ERRNO
        8. Requesting Values
        9. Accessing and Updating Parameters
        10. Symbol Table Access
          1. Variable access and update by name
          2. Variable access and update by cookie
          3. Creating and using cached values
        11. Array Manipulation
          1. Array data types
          2. Array functions
          3. Working with all the elements of an array
          4. How to create and populate arrays
        12. API Variables
          1. API version constants and variables
          2. Informational variables
        13. Boilerplate Code
      5. How gawk Finds Extensions
      6. Example: Some File Functions
        1. Using chdir() and stat()
        2. C Code for chdir() and stat()
        3. Integrating the Extensions
      7. The Sample Extensions in the gawk Distribution
        1. File-Related Functions
        2. Interface to fnmatch()
        3. Interface to fork(), wait(), and waitpid()
        4. Enabling In-Place File Editing
        5. Character and Numeric values: ord() and chr()
        6. Reading Directories
        7. Reversing Output
        8. Two-Way I/O Example
        9. Dumping and Restoring an Array
        10. Reading an Entire File
        11. Extension Time Functions
        12. API Tests
      8. The gawkextlib Project
      9. Summary
  8. IV. Appendices
    1. A. The Evolution of the awk Language
      1. Major Changes Between V7 and SVR3.1
      2. Changes Between SVR3.1 and SVR4
      3. Changes Between SVR4 and POSIX awk
      4. Extensions in Brian Kernighan’s awk
      5. Extensions in gawk Not in POSIX awk
      6. Common Extensions Summary
      7. Regexp Ranges and Locales: A Long Sad Story
      8. Major Contributors to gawk
      9. Summary
    2. B. Installing gawk
      1. The gawk Distribution
        1. Getting the gawk Distribution
        2. Extracting the Distribution
        3. Contents of the gawk Distribution
      2. Compiling and Installing gawk on Unix-Like Systems
        1. Compiling gawk for Unix-Like Systems
        2. Additional Configuration Options
        3. The Configuration Process
      3. Installation on Other Operating Systems
        1. Installation on PC Operating Systems
          1. Compiling gawk for PC operating systems
          2. Testing gawk on PC operating systems
          3. Using gawk on PC operating systems
          4. Using gawk in the Cygwin environment
          5. Using gawk in the MSYS environment
        2. Compiling and Installing gawk on Vax/VMS and OpenVMS
          1. Compiling gawk on VMS
          2. Compiling gawk dynamic extensions on VMS
          3. Installing gawk on VMS
          4. Running gawk on VMS
          5. The VMS GNV project
          6. Some VMS systems have an old version of gawk
      4. Reporting Problems and Bugs
      5. Other Freely Available awk Implementations
      6. Summary
    3. C. GNU General Public License
  9. Index
  10. Colophon
  11. Copyright