book

Perl Cookbook

by Tom Christiansen, Nathan Torkington

August 1998

Intermediate to advanced

800 pages

39h 20m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Perl Cookbook
List of Examples
Foreword
Preface
What’s in This Book
Platform Notes
Other Books
Conventions Used in This Book
Programming ConventionsTypesetting ConventionsDocumentation Conventions
We’d Like to Hear from You
Acknowledgments
TomNat
1. Strings
Introduction

Accessing Substrings
ProblemSolutionDiscussionSee Also
Establishing a Default Value
ProblemSolutionDiscussionSee Also
Exchanging Values Without Using Temporary Variables
ProblemSolutionDiscussionSee Also
Converting Between ASCII Characters and Values
ProblemSolutionDiscussionSee Also
Processing a String One Character at a Time
ProblemSolutionDiscussionSee Also
Reversing a String by Word or Character
ProblemSolutionDiscussionSee Also
Expanding and Compressing Tabs
ProblemSolutionDiscussionSee Also
Expanding Variables in User Input
ProblemSolutionDiscussionSee Also
Controlling Case
ProblemSolutionDiscussionSee Also
Interpolating Functions and Expressions Within Strings
ProblemSolutionDiscussionSee Also
Indenting Here Documents
ProblemSolutionDiscussionSee Also
Reformatting Paragraphs
ProblemSolutionDiscussionSee Also
Escaping Characters
ProblemSolutionDiscussionSee Also
Trimming Blanks from the Ends of a String
ProblemSolutionDiscussionSee Also
Parsing Comma-Separated Data
ProblemSolutionDiscussionSee Also
Soundex Matching
ProblemSolutionDiscussionSee Also
Program: fixstyle
Program: psgrep
2. Numbers
Introduction
Checking Whether a String Is a Valid Number
ProblemSolutionDiscussionSee Also
Comparing Floating-Point Numbers
ProblemSolutionDiscussionSee Also
Rounding Floating-Point Numbers
ProblemSolutionDiscussionSee Also
Converting Between Binary and Decimal
ProblemSolutionDiscussionSee Also
Operating on a Series of Integers
ProblemSolutionDiscussionSee Also
Working with Roman Numerals
ProblemSolutionDiscussionSee Also
Generating Random Numbers
ProblemSolutionDiscussionSee Also
Generating Different Random Numbers
ProblemSolutionDiscussionSee Also
Making Numbers Even More Random
ProblemSolutionDiscussionSee Also
Generating Biased Random Numbers
ProblemSolutionDiscussionSee Also
Doing Trigonometry in Degrees, not Radians
ProblemSolutionDiscussionSee Also
Calculating More Trigonometric Functions
ProblemSolutionDiscussionSee Also
Taking Logarithms
ProblemSolutionDiscussionSee Also
Multiplying Matrices
ProblemSolutionDiscussionSee Also
Using Complex Numbers
ProblemSolutionManuallyMath::ComplexDiscussionSee Also
Converting Between Octal and Hexadecimal
ProblemSolutionDiscussionSee Also
Putting Commas in Numbers
ProblemSolutionDiscussionSee Also
Printing Correct Plurals
ProblemSolutionDiscussionSee Also
Program: Calculating Prime Factors
3. Dates and Times
Introduction
Finding Today’s Date
ProblemSolutionDiscussionSee Also
Converting DMYHMS to Epoch Seconds
ProblemSolutionDiscussionSee Also
Converting Epoch Seconds to DMYHMS
ProblemSolutionDiscussionSee Also
Adding to or Subtracting from a Date
ProblemSolutionDiscussionSee Also
Difference of Two Dates
ProblemSolutionDiscussionSee Also
Day in a Week/Month/Year or Week Number
ProblemSolutionDiscussionSee Also
Parsing Dates and Times from Strings
ProblemSolutionDiscussionSee Also
Printing a Date
ProblemSolutionDiscussionSee Also
High-Resolution Timers
ProblemSolutionDiscussionSee Also
Short Sleeps
ProblemSolutionDiscussionSee Also
Program: hopdelta
4. Arrays
Introduction
Specifying a List In Your Program
ProblemSolutionDiscussionSee Also
Printing a List with Commas
ProblemSolutionDiscussionSee Also
Changing Array Size
ProblemSolutionDiscussionSee Also
Doing Something with Every Element in a List
ProblemSolutionDiscussionSee Also
Iterating Over an Array by Reference
ProblemSolutionDiscussionSee Also
Extracting Unique Elements from a List
ProblemSolutionStraightforwardFasterSimilar but with user functionFaster but differentFaster and even more differentDiscussionSee Also
Finding Elements in One Array but Not Another
ProblemSolutionStraightforward implementationMore idiomatic versionDiscussionSee Also
Computing Union, Intersection, or Difference of Unique Lists
ProblemSolutionSimple solution for union and intersectionMore idiomatic versionUnion, intersection, and symmetric differenceIndirect solutionDiscussionSee Also
Appending One Array to Another
ProblemSolutionDiscussionSee Also
Reversing an Array
ProblemSolutionDiscussionSee Also
Processing Multiple Elements of an Array
ProblemSolutionDiscussionSee Also
Finding the First List Element That Passes a Test
ProblemSolutionDiscussionSee Also
Finding All Elements in an Array Matching Certain Criteria
ProblemSolutionDiscussionSee Also
Sorting an Array Numerically
ProblemSolutionDiscussionSee Also
Sorting a List by Computable Field
ProblemSolutionDiscussionSee Also
Implementing a Circular List
ProblemSolutionProcedureDiscussionSee Also
Randomizing an Array
ProblemSolutionDiscussionSee Also
Program: words
DescriptionSee Also
Program: permute
ProblemSee Also
5. Hashes
IntroductionSee Also
Adding an Element to a Hash
ProblemSolutionDiscussionSee Also
Testing for the Presence of a Key in a Hash
ProblemSolutionDiscussionSee Also
Deleting from a Hash
ProblemSolutionDiscussionSee Also
Traversing a Hash
ProblemSolutionDiscussionSee Also
Printing a Hash
ProblemSolutionDiscussionSee Also
Retrieving from a Hash in Insertion Order
ProblemSolutionDiscussionSee Also
Hashes with Multiple Values Per Key
ProblemSolutionDiscussionSee Also
Inverting a Hash
ProblemSolutionDiscussionSee Also
Sorting a Hash
ProblemSolutionDiscussionSee Also
Merging Hashes
ProblemSolutionDiscussionSee Also
Finding Common or Different Keys in Two Hashes
ProblemSolutionFind common keysFind keys from one hash that aren’t in bothDiscussionSee Also
Hashing References
ProblemSolutionDiscussionSee Also
Presizing a Hash
ProblemSolutionDiscussionSee Also
Finding the Most Common Anything
ProblemSolutionDiscussionSee Also
Representing Relationships Between Data
ProblemSolutionDiscussionSee Also
Program: dutree
6. Pattern Matching
IntroductionThe Tricky BitsPattern-Matching ModifiersSpecial Variables
Copying and Substituting Simultaneously
ProblemSolutionDiscussionSee Also
Matching Letters
ProblemSolutionDiscussionSee Also
Matching Words
ProblemSolutionDiscussionSee Also
Commenting Regular Expressions
ProblemSolutionDiscussionSee Also
Finding the Nth Occurrence of a Match
ProblemSolutionDiscussionSee Also
Matching Multiple Lines
ProblemSolutionDiscussionSee Also
Reading Records with a Pattern Separator
ProblemSolutionDiscussionSee Also
Extracting a Range of Lines
ProblemSolutionDiscussionSee Also
Matching Shell Globs as Regular Expressions
ProblemSolutionDiscussionSee Also
Speeding Up Interpolated Matches
ProblemSolutionDiscussionSee Also
Testing for a Valid Pattern
ProblemSolutionDiscussionSee Also
Honoring Locale Settings in Regular Expressions
ProblemSolutionDiscussionSee Also
Approximate Matching
ProblemSolutionDiscussionSee Also
Matching from Where the Last Pattern Left Off
ProblemSolutionDiscussionSee Also
Greedy and Non-Greedy Matches
ProblemSolutionDiscussionSee Also
Detecting Duplicate Words
ProblemSolutionDiscussionSee Also
Expressing AND, OR, and NOT in a Single Pattern
ProblemSolutionDiscussionSee Also
Matching Multiple-Byte Characters
ProblemSolutionDiscussionAvoiding false matchesSplitting multiple-byte stringsValidating multiple-byte stringsConverting between encodingsSee Also
Matching a Valid Mail Address
ProblemSolutionDiscussionSee Also
Matching Abbreviations
ProblemSolutionDiscussionSee Also
Program: urlify
Program: tcgrep
Regular Expression Grabbag
7. File Access
IntroductionGetting a Handle on the FileStandard FileHandlesI/O Operations
Opening a File
ProblemSolutionDiscussionSee Also
Opening Files with Unusual Filenames
ProblemSolutionDiscussionSee Also
Expanding Tildes in Filenames
ProblemSolutionDiscussionSee Also
Making Perl Report Filenames in Errors
ProblemSolutionDiscussionSee Also
Creating Temporary Files
ProblemSolutionDiscussionSee Also
Storing Files Inside Your Program Text
ProblemSolutionDiscussionSee Also
Writing a Filter
ProblemSolutionDiscussionBehaviorCommand-line optionsSee Also
Modifying a File in Place with Temporary File
ProblemSolutionDiscussionSee Also
Modifying a File in Place with -i Switch
ProblemSolutionDiscussionSee Also
Modifying a File in Place Without a Temporary File
ProblemSolutionDiscussionSee Also
Locking a File
ProblemSolutionDiscussionSee Also
Flushing Output
ProblemSolutionDiscussionSee Also
Reading from Many Filehandles Without Blocking
ProblemSolutionDiscussionSee Also
Doing Non-Blocking I/O
ProblemSolutionDiscussionSee Also
Determining the Number of Bytes to Read
ProblemSolutionDiscussionSee Also
Storing Filehandles in Variables
ProblemSolutionDiscussionSee Also
Caching Open Output Filehandles
ProblemSolutionDiscussionSee Also
Printing to Many Filehandles Simultaneously
ProblemSolutionDiscussionSee Also
Opening and Closing File Descriptors by Number
ProblemSolutionDiscussionSee Also
Copying Filehandles
ProblemSolutionDiscussionSee Also
Program: netlock
Program: lockarea
8. File Contents
Introduction
Reading Lines with Continuation Characters
ProblemSolutionDiscussionSee Also
Counting Lines (or Paragraphs or Records) in a File
ProblemSolutionDiscussionSee Also
Processing Every Word in a File
ProblemSolutionDiscussionSee Also
Reading a File Backwards by Line or Paragraph
ProblemSolutionDiscussionSee Also
Trailing a Growing File
ProblemSolutionDiscussionSee Also
Picking a Random Line from a File
ProblemSolutionDiscussionSee Also
Randomizing All Lines
ProblemSolutionDiscussionSee Also
Reading a Particular Line in a File
ProblemSolutionDiscussionSee Also
Processing Variable-Length Text Fields
ProblemSolutionDiscussionSee Also
Removing the Last Line of a File
ProblemSolutionDiscussionSee Also
Processing Binary Files
ProblemSolutionDiscussionSee Also
Using Random-Access I/O
ProblemSolutionDiscussionSee Also
Updating a Random-Access File
ProblemSolutionDiscussionSee Also
Reading a String from a Binary File
ProblemSolutionDiscussionSee Also
Reading Fixed-Length Records
ProblemSolutionDiscussionSee Also
Reading Configuration Files
ProblemSolutionDiscussionSee Also
Testing a File for Trustworthiness
ProblemSolutionDiscussion
Program: tailwtmp
Program: tctee
Program: laston
9. Directories
IntroductionExecutive Summary
Getting and Setting Timestamps
ProblemSolutionDiscussionSee Also
Deleting a File
ProblemSolutionDiscussionSee Also
Copying or Moving a File
ProblemSolutionDiscussionSee Also
Recognizing Two Names for the Same File
ProblemSolutionDiscussionSee Also
Processing All Files in a Directory
ProblemSolutionDiscussionSee Also
Globbing, or Getting a List of Filenames Matching a Pattern
ProblemSolutionDiscussionSee Also
Processing All Files in a Directory Recursively
ProblemSolutionDiscussionSee Also
Removing a Directory and Its Contents
ProblemSolutionDiscussionSee Also
Renaming Files
ProblemSolutionDiscussionSee Also
Splitting a Filename into Its Component Parts
ProblemSolutionDiscussionSee Also
Program: symirror
Program: lst
10. Subroutines
Introduction
Accessing Subroutine Arguments
ProblemSolutionDiscussionSee Also
Making Variables Private to a Function
ProblemSolutionDiscussionSee Also
Creating Persistent Private Variables
ProblemSolutionDiscussionSee Also
Determining Current Function Name
ProblemSolutionDiscussionSee Also
Passing Arrays and Hashes by Reference
ProblemSolutionDiscussionSee Also
Detecting Return Context
ProblemSolutionDiscussionSee Also
Passing by Named Parameter
ProblemSolutionDiscussionSee Also
Skipping Selected Return Values
ProblemSolutionDiscussionSee Also
Returning More Than One Array or Hash
ProblemSolutionDiscussionSee Also
Returning Failure
ProblemSolutionDiscussionSee Also
Prototyping Functions
ProblemSolutionDiscussionOmitting parenthesesMimicking built-insSee Also
Handling Exceptions
ProblemSolutionDiscussionSee Also
Saving Global Values
ProblemSolutionDiscussionUsing local() for temporary values for globalsUsing local() for local handlesUsing local( ) on parts of aggregatesSee Also
Redefining a Function
ProblemSolutionDiscussionSee Also
Trapping Undefined Function Calls with AUTOLOAD
ProblemSolutionDiscussionSee Also
Nesting Subroutines
ProblemSolutionDiscussionSee Also
Program: Sorting Your Mail
See Also
11. References and Records
IntroductionReferencesAnonymous DataRecordsSee Also
Taking References to Arrays
ProblemSolutionDiscussionSee Also
Making Hashes of Arrays
ProblemSolutionDiscussionSee Also
Taking References to Hashes
ProblemSolutionDiscussionSee Also
Taking References to Functions
ProblemSolutionDiscussionSee Also
Taking References to Scalars
ProblemSolutionDiscussionSee Also
Creating Arrays of Scalar References
ProblemSolutionDiscussionSee Also
Using Closures Instead of Objects
ProblemSolutionDiscussionSee Also
Creating References to Methods
ProblemSolutionDiscussionSee Also
Constructing Records
ProblemSolutionDiscussionSee Also
Reading and Writing Hash Records to Text Files
ProblemSolutionDiscussionSee Also
Printing Data Structures
ProblemSolutionDiscussionSee Also
Copying Data Structures
ProblemSolutionDiscussionSee Also
Storing Data Structures to Disk
ProblemSolutionDiscussionSee Also
Transparently Persistent Data Structures
ProblemSolutionDiscussionSee Also
Program: Binary Trees
12. Packages, Libraries, and Modules
IntroductionModulesImport/Export RegulationsOther Kinds of Library FilesNot Reinventing the WheelSee Also
Defining a Module’s Interface
ProblemSolutionDiscussionSee Also
Trapping Errors in require or use
ProblemSolutionDiscussionSee Also
Delaying use Until Run Time
ProblemSolutionDiscussionSee Also
Making Variables Private to a Module
ProblemSolutionDiscussionSee Also
Determining the Caller’s Package
ProblemSolutionDiscussionSee Also
Automating Module Clean-Up
ProblemSolutionDiscussionSee Also
Keeping Your Own Module Directory
ProblemSolutionDiscussionSee Also
Preparing a Module for Distribution
ProblemSolutionDiscussionSee Also
Speeding Module Loading with SelfLoader
ProblemSolutionDiscussionSee Also
Speeding Up Module Loading with Autoloader
ProblemSolutionDiscussionSee Also
Overriding Built-In Functions
ProblemSolutionDiscussionSee Also
Reporting Errors and Warnings Like Built-Ins
ProblemSolutionDiscussionSee Also
Referring to Packages Indirectly
ProblemSolutionDiscussionSee Also
Using h2ph to Translate C #include Files
ProblemSolutionDiscussionSee Also
Using h2xs to Make a Module with C Code
ProblemSolutionDiscussionSee Also
Documenting Your Module with Pod
ProblemSolutionDiscussionSee Also
Building and Installing a CPAN Module
ProblemSolutionDiscussionSee Also
Example: Module Template
Program: Finding Versions and Descriptions of Installed Modules
13. Classes, Objects, and Ties
IntroductionUnder the HoodMethodsInheritanceA Warning on Indirect Object NotationSome Notes on Object TerminologyPhilosophical AsideSee Also
Constructing an Object
ProblemSolutionDiscussionSee Also
Destroying an Object
ProblemSolutionDiscussionSee Also
Managing Instance Data
ProblemSolutionDiscussionSee Also
Managing Class Data
ProblemSolutionDiscussionSee Also
Using Classes as Structs
ProblemSolutionDiscussionSee Also
Cloning Objects
ProblemSolutionDiscussionSee Also
Calling Methods Indirectly
ProblemSolutionDiscussionSee Also
Determining Subclass Membership
ProblemSolutionDiscussionSee Also
Writing an Inheritable Class
ProblemSolutionDiscussionSee Also
Accessing Overridden Methods
ProblemSolutionDiscussionSee Also
Generating Attribute Methods Using AUTOLOAD
ProblemSolutionDiscussionSee Also
Solving the Data Inheritance Problem
ProblemSolutionDiscussionSee Also
Coping with Circular Data Structures
ProblemSolutionDiscussionSee Also
Overloading Operators
ProblemSolutionDiscussionExample: Overloaded StrNum ClassExample: Overloaded FixNum ClassSee Also
Creating Magic Variables with tie
ProblemSolutionDiscussionTie Example: Outlaw $_Tie Example: Make a Hash That Always AppendsTie Example: Case-Insensitive HashTie Example: Hash That Allows Look-Ups by Key or ValueTie Example: Handle That Counts AccessTie Example: Multiple Sink FilehandlesSee Also
14. Database Access
Introduction
Making and Using a DBM File
ProblemSolutiondbmopentieDiscussionSee Also
Emptying a DBM File
ProblemSolutionDiscussionSee Also
Converting Between DBM Files
ProblemSolutionDiscussionSee Also
Merging DBM Files
ProblemSolutionDiscussionSee Also
Locking DBM Files
ProblemSolutionDiscussionSee Also
Sorting Large DBM Files
ProblemSolutionDescriptionSee Also
Treating a Text File as a Database Array
ProblemSolutionDescriptionSee Also
Storing Complex Data in a DBM File
ProblemSolutionDiscussionSee Also
Persistent Data
ProblemSolutionDiscussionSee Also
Executing an SQL Command Using DBI and DBD
ProblemSolutionDiscussionSee Also
Program: ggh—Grep Netscape Global History
See Also
15. User Interfaces
Introduction
Parsing Program Arguments
ProblemSolutionDiscussionSee Also
Testing Whether a Program Is Running Interactively
ProblemSolutionDiscussionSee Also
Clearing the Screen
ProblemSolutionDiscussionSee Also
Determining Terminal or Window Size
ProblemSolutionDiscussionSee Also
Changing Text Color
ProblemSolutionDiscussionSee Also
Reading from the Keyboard
ProblemSolutionDiscussionSee Also
Ringing the Terminal Bell
ProblemSolutionDiscussionSee Also
Using POSIX termios
ProblemSolutionDescriptionSee Also
Checking for Waiting Input
ProblemSolutionDiscussionSee Also
Reading Passwords
ProblemSolutionDiscussionSee Also
Editing Input
ProblemSolutionDiscussionSee Also
Managing the Screen
ProblemSolutionDescriptionSee Also
Controlling Another Program with Expect
ProblemSolutionDiscussionSee Also
Creating Menus with Tk
ProblemSolutionDiscussionSee Also
Creating Dialog Boxes with Tk
ProblemSolutionDiscussionSee Also
Responding to Tk Resize Events
ProblemSolutionDiscussionSee Also
Removing the DOS Shell Window with Windows Perl/Tk
ProblemSolutionDescriptionSee Also
Program: Small termcap program
DescriptionSee Also
Program: tkshufflepod
16. Process Management and Communication
IntroductionProcess CreationSignals
Gathering Output from a Program
ProblemSolutionDiscussionSee Also
Running Another Program
ProblemSolutionDiscussionSee Also
Replacing the Current Program with a Different One
ProblemSolutionDiscussionSee Also
Reading or Writing to Another Program
ProblemSolutionDiscussionSee Also
Filtering Your Own Output
ProblemSolutionDiscussionSee Also
Preprocessing Input
ProblemSolutionDiscussionSee Also
Reading STDERR from a Program
ProblemSolutionDiscussionSee Also
Controlling Input and Output of Another Program
ProblemSolutionDiscussionSee Also
Controlling the Input, Output, and Error of Another Program
ProblemSolutionDiscussionSee Also
Communicating Between Related Processes
ProblemSolutionDiscussionSee Also
Making a Process Look Like a File with Named Pipes
ProblemSolutionDiscussionSee Also
Sharing Variables in Different Processes
ProblemSolutionDiscussionSee Also
Listing Available Signals
ProblemSolutionDiscussionSee Also
Sending a Signal
ProblemSolutionDiscussionSee Also
Installing a Signal Handler
ProblemSolutionDiscussionSee Also
Temporarily Overriding a Signal Handler
ProblemSolutionDiscussionSee Also
Writing a Signal Handler
ProblemSolutionDiscussionSee Also
Catching Ctrl-C
ProblemSolutionDiscussionSee Also
Avoiding Zombie Processes
ProblemSolutionDiscussionSee Also
Blocking Signals
ProblemSolutionDiscussionSee Also
Timing Out an Operation
ProblemSolutionDiscussionSee Also
Program: sigrand
Description
17. Sockets
Introduction
Writing a TCP Client
ProblemSolutionDiscussionSee Also
Writing a TCP Server
ProblemSolutionDiscussionSee Also
Communicating over TCP
ProblemSolutionDiscussionSee Also
Setting Up a UDP Client
ProblemSolutionDiscussionSee Also
Setting Up a UDP Server
ProblemSolutionDiscussionSee Also
Using UNIX Domain Sockets
ProblemSolutionDiscussionSee Also
Identifying the Other End of a Socket
ProblemSolutionDiscussionSee Also
Finding Your Own Name and Address
ProblemSolutionDiscussionSee Also
Closing a Socket After Forking
ProblemSolutionDiscussionSee Also
Writing Bidirectional Clients
ProblemSolutionDiscussionSee Also
Forking Servers
ProblemSolutionDiscussionSee Also
Pre-Forking Servers
ProblemSolutionDiscussionSee Also
Non-Forking Servers
ProblemSolutionDiscussionSee Also
Writing a Multi-Homed Server
ProblemSolutionDiscussionSee Also
Making a Daemon Server
ProblemSolutionDiscussionSee Also
Restarting a Server on Demand
ProblemSolutionDiscussionSee Also
Program: backsniff
Program: fwdport
See Also
18. Internet Services
Introduction
Simple DNS Lookups
ProblemSolutionDiscussionSee Also
Being an FTP Client
ProblemSolutionDiscussionSee Also
Sending Mail
ProblemSolutionDiscussionSee Also
Reading and Posting Usenet News Messages
ProblemSolutionDiscussionSee Also
Reading Mail with POP3
ProblemSolutionDiscussionSee Also
Simulating Telnet from a Program
ProblemSolutionDiscussionSee Also
Pinging a Machine
ProblemSolutionDiscussionSee Also
Using Whois to Retrieve Information from the InterNIC
ProblemSolutionDiscussionSee Also
Program: expn and vrfy
19. CGI Programming
IntroductionArchitectureBehind the ScenesSecurityHTML and FormsWeb-Related Resources
Writing a CGI Script
ProblemSolutionDiscussionSee Also
Redirecting Error Messages
ProblemSolutionDiscussionSee Also
Fixing a 500 Server Error
ProblemSolutionDiscussionMake sure the web server can run the script.Make sure the script has permissions to do what it’s trying to do.Is the script valid Perl?Is the script upholding its end of the CGI protocol?Asking for help elsewhere.See Also
Writing a Safe CGI Program
ProblemSolutionDiscussionSee Also
Making CGI Scripts Efficient
ProblemSolutionDiscussionSee Also
Executing Commands Without Shell Escapes
ProblemSolutionDiscussionSee Also
Formatting Lists and Tables with HTML Shortcuts
ProblemSolutionDiscussionSee Also
Redirecting to a Different Location
ProblemSolutionDiscussionSee Also
Debugging the Raw HTTP Exchange
ProblemSolutionDiscussionSee Also
Managing Cookies
ProblemSolutionDiscussionSee Also
Creating Sticky Widgets
ProblemSolutionDiscussionSee Also
Writing a Multiscreen CGI Script
ProblemSolutionDiscussionSee Also
Saving a Form to a File or Mail Pipe
ProblemSolutionDiscussionSee Also
Program: chemiserie
20. Web Automation
Introduction
Fetching a URL from a Perl Script
ProblemSolutionDiscussionSee Also
Automating Form Submission
ProblemSolutionDiscussionSee Also
Extracting URLs
ProblemSolutionDiscussionSee Also
Converting ASCII to HTML
ProblemSolutionDiscussionSee Also
Converting HTML to ASCII
ProblemSolutionDiscussionSee Also
Extracting or Removing HTML Tags
ProblemSolutionDiscussionSee Also
Finding Stale Links
ProblemSolutionDiscussionSee Also
Finding Fresh Links
ProblemSolutionDiscussionSee Also
Creating HTML Templates
ProblemSolutionDiscussionSee Also
Mirroring Web Pages
ProblemSolutionDiscussionSee Also
Creating a Robot
ProblemSolutionDiscussionSee Also
Parsing a Web Server Log File
ProblemSolutionDiscussionSee Also
Processing Server Logs
ProblemSolutionDiscussionSee Also
Program: htmlsub
Program: hrefsub
Index
Colophon

Content preview from Perl Cookbook

Extracting URLs

Problem

You want to extract all URLs from an HTML file.

Solution

Use the HTML::LinkExtor module from CPAN:

use HTML::LinkExtor;

$parser = HTML::LinkExtor->new(undef, $base_url);
$parser->parse_file($filename);
@links = $parser->links;
foreach $linkarray (@links) {
    my @element = @$linkarray;
    my $elt_type = shift @element;                  # element type

    # possibly test whether this is an element we're interested in
    while (@element) {
        # extract the next attribute and its value
        my ($attr_name, $attr_value) = splice(@element, 0, 2);
        # ... do something with them ...
    }
}

Discussion

You can use HTML::LinkExtor in two different ways: either to call links to get a list of all links in the document once it is completely parsed, or to pass a code reference in the first argument to new. The referenced function will be called on each link as the document is parsed.

The links method clears the link list, so you can call it only once per parsed document. It returns a reference to an array of elements. Each element is itself an array reference with an HTML::Element object at the front followed by a list of attribute name and attribute value pairs. For instance, the HTML:

<A HREF="http://www.perl.com/" >Home page</A>
<IMG SRC="images/big.gif" LOWSRC="images/big-lowres.gif">

would return a data structure like this:

[
  [ a,   href   => "http://www.perl.com/" ],
  [ img, src    =>"images/big.gif",
         lowsrc => "images/big-lowres.gif" ]
]

Here’s an example of how you would use the $elt_type and the $attr_name to print out ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 1565922433Catalog Page Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills