Skip to Main Content
Perl in a Nutshell, 2nd Edition
book

Perl in a Nutshell, 2nd Edition

by Nathan Patwardhan, Ellen Siever, Stephen Spainhour
June 2002
Beginner content levelBeginner
759 pages
80h 42m
English
O'Reilly Media, Inc.
Content preview from Perl in a Nutshell, 2nd Edition

The HTML Modules

HTML modules provide an interface to parse HTML documents. After you parse the document, you can print or display it according to the markup tags or extract specific information such as hyperlinks.

The HTML::parser module provides methods for, literally, parsing HTML. It can handle HTML text from a string or file and can separate out the syntactic structures and data. You shouldn’t use HTML::Parser directly, however, since its interface hasn’t been designed to make your life easy when you parse HTML. It’s merely a base class from which you can build your own parser to deal with HTML in any way you want. And if you don’t want to roll your own HTML parser or parser class, then there’s always HTML::TokeParser and HTML::TreeBuilder, both of which are covered in this chapter.

HTML::TreeBuilder is a class that parses HTML into a syntax tree. In a syntax tree, each element of the HTML, such as container elements with beginning and end tags, is stored relative to other elements. This preserves the nested structure and behavior of HTML and its hierarchy.

A syntax tree of the TreeBuilder class is formed of connected nodes that represent each element of the HTML document. These nodes are saved as objects from the HTML::Element class. An HTML::Element object stores all the information from an HTML tag: the start tag, end tag, attributes, plain text, and pointers to any nested elements.

The remaining classes of the HTML modules use the syntax trees and its nodes of element objects ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Perl by Example, Fourth Edition

Perl by Example, Fourth Edition

Ellie Quigley
Perl Cookbook, 2nd Edition

Perl Cookbook, 2nd Edition

Tom Christiansen, Nathan Torkington
Perl in a Nutshell

Perl in a Nutshell

Nathan Patwardhan, Ellen Siever, Stephen Spainhour
Learning Perl, 7th Edition

Learning Perl, 7th Edition

Randal L. Schwartz, brian d foy, Tom Phoenix

Publisher Resources

ISBN: 0596002416Errata Page