Parsing the Data
Taking a
look at exhibit.txt
, we can see that it consists
of individual company listings separated by blank lines. Within each
company’s listing, the same sequence of lines occurs: the first
holds the company name, the next holds the booth number, the next
holds the street address, and so on. By splitting up the file
wherever we see a blank line, we can isolate individual
companies’ information. By counting lines within those
sections, we should be well on our way to extracting the relevant
data from the file. We can then use pattern-matching operators to
help us identify the data contained in lines that otherwise would be
ambiguous.
Example 5-3 shows our first version of
make_exhibit.plx
, the script that will do this parsing
and HTML-page creation. It features several new Perl features you
haven’t seen before, but not to worry; we’ll be going
through them all one by one.
Example 5-3. First version of make_exhibit.plx
#!/usr/bin/perl -w # make_exhibit.plx # this script reads a pair of data files, extracts information # relating to a group of tradeshow exhibitors, and writes # out a browseable web-based directory of those exhibitors use strict; # configuration section: my $exhibit_file = './exhibit.txt'; # script-wide variable: my %listing; # key: company name ($co_name). # value: HTML-ized listing for this company. # read and parse the main exhibitor file my @listing_lines = ( ); # holds current listing's lines for passing # to the &parse_exhibitor subroutine ...
Get Perl for Web Site Management now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.