Many programs, including ps, netstat, lsof, ls -l, find -ls, and tcpdump, can produce more output than can be conveniently summarized. Logfiles also often grow too long to be easily viewed. You could send these through a filter like grep to pick out only certain lines, but regular expressions and complex logic don’t mix well; just look at the hoops we jump through in Section 6.17.
What we’d really like is to make full queries on the program output or logfile. For example, to ask ps something like, “Show me all the processes that exceed 10K in size but which aren’t running as the superuser.” Or, “Which commands are running on pseudo-ttys?”
The psgrep program does this—and infinitely more—because the specified selection criteria are not mere regular expressions; they’re full Perl code. Each criterion is applied in turn to every line of output. Only lines matching all arguments are output. The following is a list of things to find and how to find them.
Lines containing “sh” at the end of a word:
% psgrep '/sh\b/'
Processes whose command names end in “sh”:
% psgrep 'command =~ /sh$/'
Processes running with a user ID below 10:
% psgrep 'uid < 10'
Login shells with active ttys:
% psgrep 'command =~ /^-/' 'tty ne "?"'
Processes running on pseudo-ttys:
% psgrep 'tty =~ /^[p-t]/'
Non-superuser processes running detached:
% psgrep 'uid && tty eq "?"'
Huge processes that aren’t owned by the superuser:
% psgrep 'size > 10 * 2**10' 'uid != 0'
The last call to psgrep produced the following output when run on our system. As one might expect, only netscape and its spawn qualified.
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
0 101 9751 1 0 0 14932 9652 do_select S p1 0:25 netscape
100000 101 9752 9751 0 0 10636 812 do_select S p1 0:00 (dns helper)
Example 1.6 shows the psgrep program.
Example 1-6. psgrep
#!/usr/bin/perl -w # psgrep - print selected lines of ps output by # compiling user queries into code use strict; # each field from the PS header my @fieldnames = qw(FLAGS UID PID PPID PRI NICE SIZE RSS WCHAN STAT TTY TIME COMMAND); # determine the unpack format needed (hard-coded for Linux ps) my $fmt = cut2fmt(8, 14, 20, 26, 30, 34, 41, 47, 59, 63, 67, 72); my %fields; # where the data will store die <<Thanatos unless @ARGV; usage: $0 criterion ... Each criterion is a Perl expression involving: @fieldnames All criteria must be met for a line to be printed. Thanatos # Create function aliases for uid, size, UID, SIZE, etc. # Empty parens on closure args needed for void prototyping. for my $name (@fieldnames) { no strict 'refs'; *$name = *{lc $name} = sub () { $fields{$name} }; } my $code = "sub is_desirable { " . join(" and ", @ARGV) . " } "; unless (eval $code.1) { die "Error in code: $@\n\t$code\n"; } open(PS, "ps wwaxl |") || die "cannot fork: $!"; print scalar <PS>; # emit header line while (<PS>) { @fields{@fieldnames} = trim(unpack($fmt, $_)); print if is_desirable(); # line matches their criteria } close(PS) || die "ps failed!"; # convert cut positions to unpack format sub cut2fmt { my(@positions) = @_; my $template = ''; my $lastpos = 1; for my $place (@positions) { $template .= "A" . ($place - $lastpos) . " "; $lastpos = $place; } $template .= "A*"; return $template; } sub trim { my @strings = @_; for (@strings) { s/^\s+//; s/\s+$//; } return wantarray ? @strings : $strings[0]; } # the following was used to determine column cut points. # sample input data follows #123456789012345678901234567890123456789012345678901234567890123456789012345 # 1 2 3 4 5 6 7 # Positioning: # 8 14 20 26 30 34 41 47 59 63 67 72 # | | | | | | | | | | | | __END__
FLAGS UID PID PPID PRI NI SIZE RSS WCHAN STA TTY TIME COMMAND
100 0 1 0 0 0 760 432 do_select S ? 0:02 init
140 0 187 1 0 0 784 452 do_select S ? 0:02 syslogd
100100 101 428 1 0 0 1436 944 do_exit S 1 0:00 /bin/login
100140 99 30217 402 0 0 1552 1008 posix_lock_ S ? 0:00 httpd
0 101 593 428 0 0 1780 1260 copy_thread S 1 0:00 -tcsh
100000 101 30639 9562 17 0 924 496 R p1 0:00 ps axl
0 101 25145 9563 0 0 2964 2360 idetape_rea S p2 0:06 trn
100100 0 10116 9564 0 0 1412 928 setup_frame T p3 0:00 ssh -C www
100100 0 26560 26554 0 0 1076 572 setup_frame T p2 0:00 less
100000 101 19058 9562 0 0 1396 900 setup_frame T p1 0:02 nvi /tmp/a
The psgrep program integrates many techniques
presented throughout this book. Stripping strings of leading and
trailing whitespace is found in Section 1.14.
Converting cut marks into an unpack
format to
extract fixed fields is in Section 1.1. Matching
strings with regular expressions is the entire topic of Chapter 6.
The multiline string in the here document passed to
die
is discussed in Section 1.10 and Section 1.11. The
assignment to @fields{@fieldnames}
sets many
values at once in the hash named %fields
. Hash
slices are discussed in Section 4.7 and
Section 5.10.
The sample program input contained beneath __END__
is described in Section 7.6. During
development, we used canned input from the DATA
filehandle for testing purposes. Once the program worked properly, we
changed it to read from a piped-in ps command
but left a remnant of the original filter input to aid in future
porting and maintenance. Launching other programs over a pipe is
covered in Chapter 16, including Section 16.10 and Section 16.13.
The real power and expressiveness in psgrep
derive from Perl’s use of string arguments not as mere strings
but directly as Perl code. This is similar to the technique in Section 9.9, except that in psgrep,
the user’s arguments are wrapped with a routine called
is_desirable
. That way, the cost of compiling
strings into Perl code happens only once, before the program whose
output we’ll process is even begun. For example, asking for
UIDs under 10 creates this string to eval
:
eval "sub is_desirable { uid < 10 } " . 1;
The mysterious ".1
" at the end is so that if the
user code compiles, the whole eval
returns true.
That way we don’t even have to check $@
for
compilation errors as we do in Section 10.12.
Specifying arbitrary Perl code in a filter to select records is a
breathtakingly powerful approach, but it’s not entirely
original. Perl owes much to the awk programming
language, which is often used for such filtering. One problem with
awk is that it can’t easily treat input as
fixed-size fields instead of fields separated by something. Another
is that the fields are not mnemonically named:
awk uses $1
,
$2
, etc. Plus Perl can do much that
awk cannot.
The user criteria don’t even have to be simple expressions. For
example, this call initializes a variable $id
to
user nobody ’s number to use later in its
expression:
% psgrep 'no strict "vars"; BEGIN { $id = getpwnam("nobody") } uid == $id '
How can we use unquoted words without even a dollar sign, like
uid
, command
, and
size
, to represent those respective fields in each
input record? We directly manipulate the symbol table by assigning
closures to indirect
typeglobs, which
creates functions with those names. The function names are created
using both uppercase and lowercase names, allowing both
"UID
<
10
" and "uid
<
10
“. Closures are
described in Section 11.4, and assigning them to
typeglobs to create function aliases is shown in Section 10.14.
One twist here not seen in those recipes is empty parentheses on the
closure. These allowed us to use the function in an expression
anywhere we’d use a single term, like a string or a numeric
constant. It creates a void prototype so the field-accessing function
named uid
accepts no arguments, just like the
built-in function time
. If these functions
weren’t prototyped void, expressions like
"uid
<
10
" or "size
/
2
>
rss
" would confuse the parser because it would see
the unterminated start of a wildcard glob and of a pattern match,
respectively. Prototypes are discussed in Section 10.11.
The version of psgrep demonstrated here expects the output from Red Hat Linux’s ps. To port to other systems, look at which columns the headers begin at. This approach isn’t relevant only to ps or only to Unix systems. It’s a generic technique for filtering input records using Perl expressions, easily adapted to other record layouts. The input format could be in columns, space separated, comma separated, or the result of a pattern match with capturing parentheses.
The program could even be modified to handle a user-defined database with a small change to the selection functions. If you had an array of records as described in Section 11.9, you could let users specify arbitrary selection criteria, such as:
sub id() { $_->{ID} } sub title() { $_->{TITLE} } sub executive() { title =~ /(?:vice-)?president/i } # user search criteria go in the grep clause @slowburners = grep { id < 10 && !executive } @employees;
For reasons of security and performance, this kind of power is seldom found in database engines like those described in Chapter 14. SQL doesn’t support this, but given Perl and small bit of ingenuity, it’s easy to roll it up on your own. The search engine at http://mox.perl.com/cgi-bin/MxScreen uses such a technique, but instead of output from ps, its records are Perl hashes loaded from a database.
Get Perl Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.