Monitoring System Load
Have you ever seen a system start slowing down as the wait state and uptime stats rise, and finally the system crashes? I have, and it is not a pretty sight when all of the heads start popping up over the cubes. In this chapter, we are going to look at some techniques to monitor the CPU load on a UNIX system. When the system is unhappy running under a heavy load, there are many possible causes. The system may have a runaway process that is producing a ton of zombie processes every second, or you have a tape drive failure and your database redo logs fill up a filesystem and cause the database and SAP to stop. In any case, we want to be proactive in catching a symptom in the early stages of loading down the system.
There are really only three basic things to look at when monitoring the CPU load on the system. First, look at the load statistics produced as part of the uptime command. This output indicates the average number of jobs in the run queue over the last 5, 10, and 15 minutes in AIX, and 1, 5, and 15 minutes for HP-UX, Linux, OpenBSD, and Solaris. The second measurement to look at is the percentages of CPU usage for system/kernel, user/application, I/O wait state, and idle time. These four measurements can be obtained from the iostat, vmstat, and sar outputs. We will look at each of these commands individually. The final step in monitoring the CPU load is to find the CPU hogs. Of course, to get a good feel for how the system is running we need ...