Chapter 3

Basic investigation

Abstract

In this chapter, we focus on the methodology and steps needed to perform a successful first-level system debugging and analysis. We will be using system logs and statistics to try to understand the manifestation of a problem.

Keywords

top
ps
dmesg
iostat
vmstat
sar

Profile the system status

Previous chapters have taught us the necessary models when approaching what may appear to be a problem in our environment. The idea is to carefully isolate the problem, reduce it to a minimal set of variables, and then use industry-accepted methods to prove and disprove your theories. Now, we will learn about the tools that can help us in our quest.

Environment monitors

Typically, data center hosts are configured to ...

Get Problem-solving in High Performance Computing now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.