Strategies for Unix process control offer another multiple-choice situation. Luckily, these choices aren’t nearly as complex to introduce as those offered by NT. When we speak of process control under Unix, we’re referring to three operations:
Enumerating the list of running processes on a machine
Changing their priority or process group
Terminating the processes
For the final two of these operations, there are Perl functions to do
the job: setpriority( )
, setpgrp( )
,
and kill( )
.
The first one offers us a few options.
To list running processes, you can:
Call an external program like ps
Take a crack at deciphering
/dev/kmem
Look through the
/proc
filesystemUse the
Proc::ProcessTable
module
Let’s discuss each of these approaches. For the impatient
reader, I’ll reveal right now that
Proc::ProcessTable
is my preferred technique, and
you might just skip directly to the discussion of that module. But I
recommend reading about the other techniques anyway, since they may
come in handy in the future.
Common to all modern Unix variants is a program called ps, used to list running processes. However, ps is found different places in the filesystem on different Unix variants and the command-line switches it takes are also not consistent across variants. Therein lies one problem with this option: it lacks portability.
An even more annoying problem is the difficulty in parsing the output (which also varies from variant to variant). Here’s a snippet of output from ps on a SunOS machine:
USER PID %CPU %MEM SZ RSS TT STAT START TIME COMMAND dnb 385 0.0 0.0 268 0 p4 IW Jul 2 0:00 /bin/zsh dnb 24103 0.0 2.610504 1092 p3 S Aug 10 35:49 emacs dnb 389 0.0 2.5 3604 1044 p4 S Jul 2 60:16 emacs remy 15396 0.0 0.0 252 0 p9 IW Jul 7 0:01 -zsh (zsh) sys 393 0.0 0.0 28 0 ? IW Jul 2 0:02 in.identd dnb 29488 0.0 0.0 68 0 p5 IW 20:15 0:00 screen dnb 29544 0.0 0.4 24 148 p7 R 20:39 0:00 less dnb 5707 0.0 0.0 260 0 p6 IW Jul 24 0:00 -zsh (zsh) root 28766 0.0 0.0 244 0 ? IW 13:20 0:00 -:0 (xdm)
Notice the third line. Two of the columns have run together, making the parsing of this output an annoying task. It’s not impossible, just annoying. Some Unix variants are kinder than others in this regard, but it is something you may have to take into account.
The Perl code required for this option is straightforward:
open( )
to run ps,
while(<FH>){...}
to read the output, and
split( )
, unpack( )
, or
substr( )
to parse it. A recipe for this can be
found in the Perl Cookbook by Tom Christiansen
and Nathan Torkington (O’Reilly).
I only mention this option for
completeness’ sake. It is possible to write code that opens up
a device like /dev/kmem
and accesses the current
running kernel’s memory structures. With this access, you can
track down the current process table in memory and read it. Given the
pain involved (taking apart complex binary structures by hand), and
its extreme non-portability (just a version difference within the
same operating system is likely to break your program), I’d
strongly recommend against using this option.
If you decide not to heed this advice, you should begin by memorizing
the Perl documentation for pack( )
,
unpack( )
, and the header files for your kernel.
Open the kernel memory file (often /dev/kmem
),
then read( )
and unpack( )
to your heart’s content. You may find it instructive to look at
the source for programs like top (found at
ftp://ftp.groupsys.com/pub/top)
that perform this task using a great deal of C code. Our next option
offers a slightly better version of this method.
One of the more interesting additions
to Unix found in most of the current variants is the
/proc
filesystem. This is a magical filesystem
that has nothing to do with data storage. It provides a file-based
interface for the running process table of a machine. A
“directory” named after the process ID appears in this
filesystem for each running process. In this directory are a set of
“files” that provide information about that process. One
of these files can be written to, thus allowing control of this
process.
It is a really clever concept, and that’s the good news. The
bad news is that each Unix vendor/developer team decided to take this
clever concept and run with it in a different direction. As a result,
the files found in a /proc
directory are often
variant-specific, both in name and format. For a description of which
files are available and what they contain, you will need to see the
manual pages (usually found in sections 4, 5, or 8) for
procfs or mount_ procfs
on your system.
The one fairly portable use of the /proc
filesystem is the enumeration of running processes. If we want to
list just the process IDs and their owners, we can use Perl’s
directory and lstat( )
operators:
opendir(PROC,"/proc") or die "Unable to open /proc:$!\n"; while (defined($_= readdir(PROC))){ next if ($_ eq "." or $_ eq ".."); next unless /^\d+$/; # filter out any random non-pid files print "$_\t". getpwuid((lstat "/proc/$_")[4])."\n"; } closedir(PROC);
If you are interested in more information about a process, you will
have to open and unpack( )
the appropriate
binary file in the /proc
directories. Common
names for this file are status
and
psinfo
. The manual pages cited a moment ago
should provide details about the C structure found in this file or at
least a pointer to a C include file that documents this structure.
Because these are operating system- (and OS version-) specific
formats, you are still going to run into the program fragility
mentioned in our previous option.
You may be feeling discouraged at this point because all of our options so far look like they require code with lots of special cases, one for each version of each operating system we wish to support. Luckily, we have one more option up our sleeve that may help in this regard.
Daniel J.
Urist (with the help of some volunteers) has been kind enough to
write a module called Proc::ProcessTable
that
offers a consistent interface to the process table for the major Unix
variants. It hides the vagaries of the different
/proc
or kmem
implementations for you, allowing you to write relatively portable
code.
Simply load the module, create a
Proc::ProcessTable::Process
object, and run
methods from that object:
use Proc::ProcessTable; $tobj = new Proc::ProcessTable;
This object uses Perl’s tied variable functionality to present
a real-time view of the system. You do not need to call a special
function to refresh the object; each time you access it, it re-reads
the process table. This is similar to the %Process
hash we saw in our Mac::Processes
discussion
earlier in this chapter.
To get at this information, you call the object method
table( )
:
$proctable = $tobj->table( );
table( )
returns a reference to an array with
members that are references to individual process objects. Each of
these objects has its own set of methods that returns information
about that process. For instance, here’s how you would get a
listing of the process IDs and owners:
use Proc::ProcessTable; $tobj = new Proc::ProcessTable; $proctable = $tobj->table( ); for (@$proctable){ print $_->pid."\t". getpwuid($_->uid)."\n"; }
If you want to know which process methods are available on your Unix
variant, the fields( )
method of your
Proc::ProcessTable
object
($tobj
above) will return a list for you.
Proc::ProcessTable
also adds three other methods
to each process object, kill( )
,
priority( )
, and pgrp( )
,
which are just frontends to the built-in Perl function we mentioned
at the beginning of this section.
To bring us back to the big picture, let’s look at some of the
uses of these process control techniques. We started to examine
process control in the context of user actions, so let’s look
at a few teeny scripts that focus on these actions. We’ll use
the Proc::ProcessTable
on Unix for these examples,
but these ideas are not operating system specific.
The
first example is from the documentation for
Proc::ProcessTable
:
use Proc::ProcessTable; $t = new Proc::ProcessTable; foreach $p (@{$t->table}){ if ($p->pctmem > 95){ $p->kill(9); } }
This code will shoot down any processes consuming 95% of that
machine’s memory when run on the Unix variants that provide the
pctmem( )
method (most do). As it stands, this
code is probably too ruthless to be used in real life. It would be
much more reasonable to add something like this before the
kill( )
command:
print "about to nuke ".$p->pid."\t". getpwuid($p->uid)."\n"; print "proceed? (yes/no) "; chomp($ans = <>); next unless ($ans eq "yes");
There’s a bit of a race condition here: it is possible that the system state will change during delay induced by prompting the user. Given that we are only prompting for huge processes, and huge processes are those least likely to change state in a short amount of time, we’re probably fine coding this way. If you wanted to be pedantic you would probably collect the list of processes to be killed first, prompt for input, and then recheck the state of the process table before actually killing the desired processes.
There are times when death is too good for a process. Sometimes it is important to notice a process is running while it is running so real life action (like “user attitude correction”) can be taken. For example, at our site we have a policy against the use of Internet Relay Chat bots. Bots are daemon processes that connect to an IRC network of chat servers and perform automated actions. Though bots can be used for constructive purposes, these days they play a mostly antisocial role on IRC. We’ve also had security breaches come to our attention because the first (and often only) thing the intruder has done is put up an IRC bot of some sort. As a result, noting their presence on our system without killing them is important to us.
The most common bot by far is called eggdrop. If we wanted to look for this process name being run on our system, we could use code like this:
use Proc::ProcessTable; open(LOG,">>$logfile") or die "Can't open logfile for append:$!\n"; $t = new Proc::ProcessTable; foreach $p (@{$t->table}){ if ($p->fname( ) =~ /eggdrop/i){ print LOG time."\t".getpwuid($p->uid)."\t".$p->fname( )."\n"; } } close(LOG);
If you are thinking, “This code is not good enough! All someone has to do is rename the eggdrop executable to evade its check,” you’re absolutely right. We’ll take a stab at writing less naïve bot-check code in the very last section of this chapter.
In the meantime, let’s see one more example where Perl assists us in managing user processes. So far all of our examples have been fairly negative. We’ve seen code that deals with resource-hungry and naughty processes. Let’s look at something with a sunnier disposition.
There are times when a system administrator needs to know which (legitimate) programs are being used by users on a system. Sometimes this is necessary in the context of software metering where there are legal concerns about the number of users running a program concurrently. In those cases there is usually a licensing mechanism in place to handle the bean counting. Another situation where this knowledge comes in handy is that of machine migration. If you are migrating a user population from one architecture to another, you’ll want to make sure all the programs used on the previous architecture are available on the new one.
One approach to solving this problem involves replacing every non-OS binary available to users with a wrapper that first records that a particular binary has been run and then actually runs it. This can be difficult to implement if there are a large number of binaries. It also has the unpleasant side effect of slowing down every program invocation.
If precision is not important and a rough estimate of which binaries
are in use will suffice, then we can use
Proc::ProcessTable
to solve this problem as well.
Here’s some code that wakes up every five minutes and surveys
the current process landscape. It keeps a simple count of all of the
process names it finds, and is smart enough not to count processes it
already saw during its last period of wakefulness twice. Every hour
it prints its findings and starts collecting again. We wait five
minutes between each run because walking the process table is usually
a resource-intensive operation and we’d prefer this program add
as little load to the system as possible:
use Proc::ProcessTable; $interval = 600; # sleep interval of 5 minutes $partofhour = 0; # keep track of where in hour we are $tobj = new Proc::ProcessTable; # create new process object # forever loop, collecting stats every $intervar secs # and dumping them once an hour while(1){ &collectstats; &dumpandreset if ($partofhour >= 3600); sleep($interval); } # collect the process statistics sub collectstats { my($process); foreach $process (@{$tobj->table}){ # we should ignore ourselves next if ($process->pid( ) == $$); # save this process info for our next run push(@last,$process->pid(),$process->fname( )); # ignore this process if we saw it last iteration next if ($last{$process->pid()} eq $process->fname( )); # else, remember it $collection{$process->fname( )}++; } # set the last hash using the current table for our next run %last = @last; $partofhour += $interval; } # dump out the results and reset our counters sub dumpandreset{ print scalar localtime(time).("-"x50)."\n"; for (sort reverse_value_sort keys %collection){ write; } undef %collection; $partofhour = 0; } } # (reverse) sort by values in %collection and by key name sub reverse_value_sort{ return $collection{$b} <=> $collection{$a} || $a cmp $b; } format STDOUT = @<<<<<<<<<<<<< @>>>> $_, $collection{$_} . format STDOUT_TOP = Name Count -------------- ----- .
There are many ways this program could be enhanced. It could track processes on a per-user basis (i.e., only recording one instance of a program launch per user), collect daily stats, present its information as a nice bar graph, and so on. It’s just a matter of where you want to take it.
Get Perl for System Administration now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.