Summary
In the previous chapter, Chapter 7, FileSystem Errors and Recovery we noticed a simple RAID failure message in our /var/log/messages log file. In this chapter, we used a Data Collector approach to investigate the cause of that failure message.
After investigating with the RAID management command mdadm, we found several RAID devices in a degraded state. Using dmesg, we were able to determine which hard drive devices were affected and that the disks at some point were removed from service. We also found that the disk event counts were mismatched, preventing the disks from being re-added automatically.
We verified that the devices were not physically faulty with dmesg and choose to re-add them to the RAID array.
While this chapter focused heavily ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access