Chapter 24. Auditing Your Environment for Improvements
Joan O’Callaghan
Adopting a SRE (site reliability engineering) mindset doesn’t only start after your first official SRE project. What can you do with no SRE staff when you want to make your company more reliable? The first step is to review what you already have. You need to know your environment better. Audit it and record the risks. Start with your worst-case scenario. Security breaches, data loss, and downtime are bad for everyone, but what would destroy your business? Know your kryptonite and focus on that first.
Next, move on to capacity. If you don’t know your limits, you can’t keep safe or plan your growth. Determine whether you have any capacity issues. How much headroom do you have, if any? What is the lead time to get more of anything? Dig into whether you have peak traffic or usage patterns.
Another important area is security. At a fast-moving organization, unfortunately, this can be overlooked until it becomes a problem. Who has access to what, and when people leave, are they properly off-boarded? Do you have a password manager, and have you turned on audit logs for your cloud accounts? How many people can destroy your company?
With infrastructure needs, you want to think about backups. Start by making a fast infrastructure diagram—just whiteboard it and take a photo. Is there one of anything? Is it all reproducible? ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access