Earlier, we noticed that the filesystem was 100 percent full. Unfortunately, the version of
sysstat we have installed doesn't capture disk space usage. A useful thing to identify is when the filesystem filled up as compared to when our run queue started to increase:
Jul 5 01:48:01 localhost auditd: Audit daemon is low on disk space for logging Jul 5 01:48:01 localhost auditd: Audit daemon is suspending logging due to low disk space.
From the log messages we saw earlier, we could see the
auditd process identified the low disk space at
01:48. This is extremely close to the time our run queue spike was seen.
This is building towards a hypothesis that the problem's root cause was a filesystem filling ...