book

Classic Shell Scripting

by Arnold Robbins, Nelson H. F. Beebe

May 2005

Intermediate to advanced

560 pages

15h 18m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Classic Shell Scripting
SPECIAL OFFER: Upgrade this ebook with O’Reilly
A Note Regarding Supplemental Files
Foreword
Preface
Intended Audience
What You Should Already Know
Chapter Summary
Conventions Used in This Book
Code Examples
Unix Tools for Windows Systems
CygwinDJGPPMKS ToolkitAT&T UWIN

Safari Enabled
We'd Like to Hear from You
Acknowledgments
1. Background
1.1. Unix History
1.2. Software Tools Principles
1.3. Summary
2. Getting Started
2.1. Scripting Languages Versus Compiled Languages
2.2. Why Use a Shell Script?
2.3. A Simple Script
2.4. Self-Contained Scripts: The #! First Line
2.5. Basic Shell Constructs
2.5.1. Commands and Arguments2.5.2. Variables2.5.3. Simple Output with echo2.5.4. Fancier Output with printf2.5.5. Basic I/O Redirection2.5.5.1. Redirection and pipelines2.5.5.2. Special files: /dev/null and /dev/tty2.5.6. Basic Command Searching
2.6. Accessing Shell Script Arguments
2.7. Simple Execution Tracing
2.8. Internationalization and Localization
2.9. Summary
3. Searching and Substitutions
3.1. Searching for Text3.1.1. Simple grep
3.2. Regular Expressions
3.2.1. What Is a Regular Expression?3.2.1.1. POSIX bracket expressions3.2.2. Basic Regular Expressions3.2.2.1. Matching single characters3.2.2.2. Backreferences3.2.2.3. Matching multiple characters with one expression3.2.2.4. Anchoring text matches3.2.2.5. BRE operator precedence3.2.3. Extended Regular Expressions3.2.3.1. Matching single characters3.2.3.2. Backreferences don't exist3.2.3.3. Matching multiple regular expressions with one expression3.2.3.4. Alternation3.2.3.5. Grouping3.2.3.6. Anchoring text matches3.2.3.7. ERE operator precedence3.2.4. Regular Expression Extensions3.2.5. Which Programs Use Which Regular Expressions?3.2.6. Making Substitutions in Text Files3.2.7. Basic Usage3.2.7.1. Substitution details3.2.8. sed Operation3.2.8.1. To print or not to print3.2.9. Matching Specific Lines3.2.10. How Much Text Gets Changed?3.2.11. Lines Versus Strings
3.3. Working with Fields
3.3.1. Text File Conventions3.3.2. Selecting Fields with cut3.3.3. Joining Fields with join3.3.4. Rearranging Fields with awk3.3.4.1. Patterns and actions3.3.4.2. Fields3.3.4.3. Setting the field separators3.3.4.4. Printing lines3.3.4.5. Startup and cleanup actions
3.4. Summary
4. Text Processing Tools
4.1. Sorting Text4.1.1. Sorting by Lines4.1.2. Sorting by Fields4.1.3. Sorting Text Blocks4.1.4. Sort Efficiency4.1.5. Sort Stability4.1.6. Sort Wrap-Up
4.2. Removing Duplicates
4.3. Reformatting Paragraphs
4.4. Counting Lines, Words, and Characters
4.5. Printing
4.5.1. Evolution of Printing Technology4.5.2. Other Printing Software
4.6. Extracting the First and Last Lines
4.7. Summary
5. Pipelines Can Do Amazing Things
5.1. Extracting Data from Structured Text Files
5.2. Structured Data for the Web
5.3. Cheating at Word Puzzles
5.4. Word Lists
5.5. Tag Lists
5.6. Summary
6. Variables, Making Decisions, and Repeating Actions
6.1. Variables and Arithmetic6.1.1. Variable Assignment and the Environment6.1.2. Parameter Expansion6.1.2.1. Expansion operators6.1.2.2. Positional parameters6.1.2.3. Special variables6.1.3. Arithmetic Expansion
6.2. Exit Statuses
6.2.1. Exit Status Values6.2.2. if-elif-else-fi6.2.3. Logical NOT, AND, and OR6.2.4. The test Command
6.3. The case Statement
6.4. Looping
6.4.1. for Loops6.4.2. while and until Loops6.4.3. break and continue6.4.4. shift and Option Processing
6.5. Functions
6.6. Summary
7. Input and Output, Files, and Command Evaluation
7.1. Standard Input, Output, and Error
7.2. Reading Lines with read
7.3. More About Redirections
7.3.1. Additional Redirection Operators7.3.2. File Descriptor Manipulation
7.4. The Full Story on printf
7.5. Tilde Expansion and Wildcards
7.5.1. Tilde Expansion7.5.2. Wildcarding7.5.2.1. Hidden files
7.6. Command Substitution
7.6.1. Using sed for the head Command7.6.2. Creating a Mailing List7.6.3. Simple Math: expr
7.7. Quoting
7.8. Evaluation Order and eval
7.8.1. The eval Statement7.8.2. Subshells and Code Blocks
7.9. Built-in Commands
7.9.1. The set Command
7.10. Summary
8. Production Scripts
8.1. Path Searching
8.2. Automating Software Builds
8.3. Summary
9. Enough awk to Be Dangerous
9.1. The awk Command Line
9.2. The awk Programming Model
9.3. Program Elements
9.3.1. Comments and Whitespace9.3.2. Strings and String Expressions9.3.3. Numbers and Numeric Expressions9.3.4. Scalar Variables9.3.5. Array Variables9.3.6. Command-Line Arguments9.3.7. Environment Variables
9.4. Records and Fields
9.4.1. Record Separators9.4.2. Field Separators9.4.3. Fields
9.5. Patterns and Actions
9.5.1. Patterns9.5.2. Actions
9.6. One-Line Programs in awk
9.7. Statements
9.7.1. Sequential Execution9.7.2. Conditional Execution9.7.3. Iterative Execution9.7.4. Array Membership Testing9.7.5. Other Control Flow Statements9.7.6. User-Controlled Input9.7.7. Output Redirection9.7.8. Running External Programs
9.8. User-Defined Functions
9.9. String Functions
9.9.1. Substring Extraction9.9.2. Lettercase Conversion9.9.3. String Searching9.9.4. String Matching9.9.5. String Substitution9.9.6. String Splitting9.9.7. String Reconstruction9.9.8. String Formatting
9.10. Numeric Functions
9.11. Summary
10. Working with Files
10.1. Listing Files10.1.1. Long File Listings10.1.2. Listing File Metadata
10.2. Updating Modification Times with touch
10.3. Creating and Using Temporary Files
10.3.1. The $$ Variable10.3.2. The mktemp Program10.3.3. The /dev/random and /dev/urandom Special Files
10.4. Finding Files
10.4.1. Finding Files Quickly10.4.2. Finding Where Commands Are Stored10.4.3. The find Command10.4.3.1. Using the find command10.4.3.2. A simple find script10.4.3.3. A complex find script10.4.4. Finding Problem Files
10.5. Running Commands: xargs
10.6. Filesystem Space Information
10.6.1. The df Command10.6.2. The du Command
10.7. Comparing Files
10.7.1. The cmp and diff Utilities10.7.2. The patch Utility10.7.3. File Checksum Matching10.7.4. Digital Signature Verification
10.8. Summary
11. Extended Example: Merging User Databases
11.1. The Problem
11.2. The Password Files
11.3. Merging Password Files
11.3.1. Separating Users by Manageability11.3.2. Managing UIDs11.3.3. Creating User-Old UID-New UID Triples
11.4. Changing File Ownership
11.5. Other Real-World Issues
11.6. Summary
12. Spellchecking
12.1. The spell Program
12.2. The Original Unix Spellchecking Prototype
12.3. Improving ispell and aspell
12.3.1. Private Spelling Dictionaries12.3.2. ispell and aspell
12.4. A Spellchecker in awk
12.4.1. Introductory Comments12.4.2. Main Body12.4.3. initialize( )12.4.4. get_dictionaries( )12.4.5. scan_options( )12.4.6. load_dictionaries( )12.4.7. load_suffixes( )12.4.8. order_suffixes( )12.4.9. spell_check_line( )12.4.10. spell_check_word( )12.4.11. strip_suffixes( )12.4.12. report_exceptions( )12.4.13. Retrospective on Our Spellchecker12.4.14. Efficiency of awk Programs
12.5. Summary
13. Processes
13.1. Process Creation
13.2. Process Listing
13.3. Process Control and Deletion
13.3.1. Deleting Processes13.3.2. Trapping Process Signals
13.4. Process System-Call Tracing
13.5. Process Accounting
13.6. Delayed Scheduling of Processes
13.6.1. sleep: Delay Awhile13.6.2. at: Delay Until Specified Time13.6.3. batch: Delay for Resource Control13.6.4. crontab: Rerun at Specified Times
13.7. The /proc Filesystem
13.8. Summary
14. Shell Portability Issues and Extensions
14.1. Gotchas
14.2. The bash shopt Command
14.3. Common Extensions
14.3.1. The select Loop14.3.2. Extended Test Facility14.3.3. Extended Pattern Matching14.3.4. Brace Expansion14.3.5. Process Substitution14.3.6. Indexed Arrays14.3.7. Miscellaneous Extensions
14.4. Download Information
14.4.1. bash14.4.2. ksh93
14.5. Other Extended Bourne-Style Shells
14.6. Shell Versions
14.7. Shell Initialization and Termination
14.7.1. Bourne Shell (sh) Startup14.7.2. Korn Shell Startup14.7.3. Bourne-Again Shell Startup and Termination14.7.4. Z-Shell Startup and Termination
14.8. Summary
15. Secure Shell Scripts: Getting Started
15.1. Tips for Secure Shell Scripts
15.2. Restricted Shell
15.3. Trojan Horses
15.4. Setuid Shell Scripts: A Bad Idea
15.5. ksh93 and Privileged Mode
15.6. Summary
A. Writing Manual Pages
A.1. Manual Pages for pathfind
A.2. Manual-Page Syntax Checking
A.3. Manual-Page Format Conversion
A.4. Manual-Page Installation
B. Files and Filesystems
B.1. What Is a File?
B.2. How Are Files Named?
B.3. What's in a Unix File?
B.4. The Unix Hierarchical Filesystem
B.4.1. Filesystem StructureB.4.2. Layered FilesystemsB.4.3. Filesystem Implementation OverviewB.4.4. Devices as Unix Files
B.5. How Big Can Unix Files Be?
B.6. Unix File Attributes
B.6.1. File Ownership and PermissionsB.6.1.1. OwnershipB.6.1.2. PermissionsB.6.1.3. Default permissionsB.6.1.4. Permissions in actionB.6.1.5. Directory permissionsB.6.2. File TimestampsB.6.3. File LinksB.6.4. File Size and Timestamp VariationsB.6.5. Other File Metadata
B.7. Unix File Ownership and Privacy Issues
B.8. Unix File Extension Conventions
B.9. Summary
C. Important Unix Commands
C.1. Shells and Built-in Commands
C.2. Text Manipulation
C.3. Files
C.4. Processes
C.5. Miscellaneous Programs
16. Bibliography
16.1. Unix Programmer's Manuals
16.2. Programming with the Unix Mindset
16.3. Awk and Shell
16.4. Standards
16.5. Security and Cryptography
16.6. Unix Internals
16.7. O'Reilly Books
16.8. Miscellaneous Books
About the Authors
Colophon
SPECIAL OFFER: Upgrade this ebook with O’Reilly

Content preview from Classic Shell Scripting

Removing Duplicates

It is sometimes useful to remove consecutive duplicate records from a data stream. We showed in Section 4.1.2 that sort -u would do that job, but we also saw that the elimination is based on matching keys rather than matching records. The uniq command provides another way to filter data: it is frequently used in a pipeline to eliminate duplicate records downstream from a sort operation:

sort ... | uniq | ...

uniq has three useful options that find frequent application. The -c option prefixes each output line with a count of the number of times that it occurred, and we will use it in the word-frequency filter in Example 5-5 in Chapter 5. The -d option shows only lines that are duplicated, and the -u option shows just the nonduplicate lines. Here are some examples:

$ cat latin-numbers                      
            Show the test file
tres
unus
duo
tres
duo
tres

$ sort latin-numbers | uniq              
            Show unique sorted records
duo
tres
unus

$ sort latin-numbers | uniq -c           
            Count unique sorted records
      2 duo
      3 tres
      1 unus

$ sort latin-numbers | uniq -d           
            Show only duplicate records
duo
tres

$ sort latin-numbers | uniq -u           
            Show only nonduplicate records
unus

uniq is sometimes a useful complement to the diff utility for figuring out the differences between two similar data streams: dictionary word lists, pathnames in mirrored directory trees, telephone books, and so on. Most implementations have other options that you can find described in the manual pages for uniq(1), but their use is rare. Like sort, uniq is ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 0596005954Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Classic Shell Scripting

by Arnold Robbins, Nelson H. F. Beebe

Removing Duplicates

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

More than 5,000 organizations count on O’Reilly

Julian F.

Addison B.

Amir M.

Mark W.

You might also like

Complete Bash Shell Scripting

Linux Command Line and Shell Scripting Bible, 4th Edition

Linux Command Line and Shell Scripting Bible, 3rd Edition

Bash Shell Scripting, 2nd Edition

Publisher Resources