Modern System Administration

Book description

Early system administration required in-depth knowledge of a variety of services on individual systems. Now, the job is increasingly complex and different from one company to the next with an ever-growing list of technologies and third-party services to integrate. How does any one individual stay relevant in systems and services? This practical guide helps anyone in operations—sysadmins, automation engineers, IT professionals, and site reliability engineers—understand the essential concepts of the role today.

Collaboration, automation, and the evolution of systems change the fundamentals of operations work. No matter where you are in your journey, this book provides you the information to craft your path to advancing essential system administration skills. Author Jennifer Davis provides examples of modern practices and tools with recommended materials to advance your skills.

Topics include:

  • Development and testing: Version control, fundamentals of virtualization and containers, testing, and architecture review
  • Deploying and configuring services: Infrastructure management, networks, security, storage, serverless, and release management
  • Scaling administration: Monitoring and observability, capacity planning, log management and analysis, and security and compliance

Publisher resources

View/Submit Errata

Table of contents

  1. Foreword
  2. Preface
    1. Who Should Read This Book?
    2. What This Book Is Not
    3. Scope of This Book
    4. If I Could Tell You Only One Thing
    5. If I Could Tell You Only One More Thing
    6. Conventions Used in This Book
    7. O’Reilly Online Learning
    8. How to Contact Us
    9. Acknowledgments
  3. Introducing Modern System Administration
    1. Map Your Journey
    2. Embrace a Mindset Shift
      1. What Is the Job?
      2. Flavors of System Administration
    3. Embrace Evolving Practices
    4. Embrace Collaboration
    5. Embrace Sustainability
    6. Wrapping Up
  4. I. Reasoning About Systems
  5. 1. Patterns and Interconnections
    1. How to Connect Things
    2. How Things Communicate
      1. Application Layer
      2. Transport Layer
      3. Network Layer
      4. Data Link Layer
      5. Physical Layer
    3. Wrapping Up
  6. 2. Computing Environments
    1. Common Workloads
    2. Choosing the Location of Your Workloads
      1. On-Prem
      2. Cloud Computing
    3. Compute Options
      1. Serverless
      2. Containers
      3. Virtual Machines
    4. Guidelines for Choosing Compute
    5. Wrapping Up
  7. 3. Storage
    1. Why Care About Storage?
    2. Key Characteristics
    3. Storage Categories
      1. Block Storage
      2. File Storage
      3. Object Storage
      4. Database Storage
    4. Considerations for Your Storage Strategy
      1. Anticipate Your Capacity and Latency Requirements
      2. Retain Your Data as Long as Is Reasonably Necessary
      3. Respect the Privacy Concerns of Your Users
      4. Defend Your Data
      5. Be Prepared to Handle Disaster Recovery Situations
    5. Wrapping Up
  8. 4. Network
    1. Caring About Networks
    2. Key Characteristics of Networks
    3. Build a Network
    4. Virtualization
    5. Software-Defined Networks
    6. Content Distribution Networks
    7. Guidelines to Your Network Strategy
    8. Wrapping Up
  9. II. Practices
  10. 5. Sysadmin Toolkit
    1. What Is Your Digital Toolkit?
    2. The Components of Your Toolkit
      1. Choosing an Editor
      2. Choosing Programming Languages
      3. Frameworks and Libraries
      4. Other Helpful Utilities
    3. Wrapping Up
  11. 6. Version Control
    1. What Is Version Control?
    2. Benefits of Version Control
    3. Organizing Infra Projects
    4. Wrapping Up
  12. 7. Testing
    1. You’re Already Testing
    2. Common Types of Testing
      1. Linting
      2. Unit Tests
      3. Integration Tests
      4. End-to-End Tests
    3. Explicit Testing Strategy
    4. Improving Your Tests; Learning from Failure
    5. Next Steps
    6. Wrapping Up
  13. 8. Infrastructure Security
    1. What Is Infrastructure Security?
    2. Share Security Responsibilities
    3. Borrow the Attacker Lens
    4. Design for Security Operability
    5. Categorize Discovered Issues
    6. Wrapping Up
  14. 9. Documentation
    1. Know Your Audience
    2. Dimensions of Documentation
    3. Organization Practices
      1. Organizing a Topic
      2. Organizing a Site
    4. Recommendations for Quality Documentation
    5. Wrapping Up
  15. 10. Presentations
    1. Know Your Audience
    2. Choose Your Channel
    3. Choose Your Story Type
    4. Storytelling in Practice
      1. Case #1: Charts Are Worth a Thousand Words
      2. Case #2: Telling the Same Story with a Different Audience
      3. The Key Takeaways
    5. Know Your Visuals
      1. Visual Cues
      2. Chart Types
    6. Recommended Visualization Practices
    7. Wrapping Up
  16. III. Assembling the System
  17. 11. Scripting Infrastructure
    1. Why Script Your Infrastructure?
    2. Three Lenses to Model Your Infrastructure
      1. Code to Build Machine Images
      2. Code to Provision Infrastructure
      3. Code to Configure Infrastructure
    3. Getting Started
    4. Wrapping Up
  18. 12. Managing Your Infrastructure
    1. Infrastructure as Code
    2. Treating Your Infrastructure as Data
    3. Getting Started with Infrastructure Management
      1. Linting
      2. Writing Unit Tests
      3. Writing Integration Tests
      4. Writing End-to-End Tests
    4. Wrapping Up
  19. 13. Securing Your Infrastructure
    1. Assessing Attack Vectors
    2. Manage Identity and Access
      1. How Should You Control Access to Your System?
      2. Who Should Have Access to Your System?
    3. Manage Secrets
      1. Password Managers and Secret Management Software
      2. Defending Secrets and Monitoring Usage
    4. Securing Your Computing Environment
    5. Securing Your Network
    6. Security Recommendations for Your Infrastructure Management
    7. Wrapping Up
  20. IV. Monitoring the System
  21. 14. Monitoring Theory
    1. Why Monitor?
    2. How Do Monitoring and Observability Differ?
    3. Monitoring Building Blocks
      1. Events
      2. Monitors
      3. Data: Metrics, Logs, and Tracing
    4. First-Level Monitoring
      1. Event Detection
      2. Data Collection
      3. Data Reduction
      4. Data Analysis
      5. Data Presentation
    5. Second-Level Monitoring
    6. Wrapping Up
  22. 15. Compute and Software Monitoring in Practice
    1. Identify Your Desired Outputs
    2. What Should You Monitor?
      1. Do What You Can Now
      2. Monitors That Matter
    3. Plan for a Monitoring Project
    4. What Alerts Should You Set?
    5. Examine Monitoring Platforms
    6. Choose a Monitoring Tool or Platform
    7. Wrapping Up
  23. 16. Managing Monitoring Data
    1. What Is Monitoring Data?
      1. Metrics
      2. Logs
      3. Structured Logs
      4. Tracing
      5. Distributed Tracing
    2. Choose Your Data Types
    3. Retain Log Data
    4. Analyze Log Data
    5. Monitoring Data at Scale
    6. Wrapping Up
  24. 17. Monitor Your Work
    1. Why Should You Monitor Your Work?
    2. Manage Your Work with Kanban
    3. Choose a Platform
    4. Find the Interesting Information
    5. Wrapping Up
  25. V. Scaling the System
  26. 18. Capacity Management
    1. What Is Capacity?
    2. The Capacity Management Model
      1. Resource Procurement
      2. Justification
      3. Management
      4. Monitoring
    3. The Framework for Capacity Planning
    4. Do You Need Capacity Planning with Cloud Computing?
    5. Wrapping Up
  27. 19. Developing On-Call Resilience
    1. What Is On-Call?
    2. Humane On-Call Processes
      1. Check Your On-Call Policies
      2. Preparing for On-Call
      3. One Week Out
      4. The Night Before
      5. Your On-Call Rotation
      6. On-Call Handoff
      7. The Day After On-Call
    3. Monitor the On-Call Experience
    4. Wrapping Up
  28. 20. Managing Incidents
    1. What Is an Incident?
    2. What Is Incident Management?
    3. Planning and Preparing for Incidents
      1. Set Up and Document Communication Channels
      2. Train for Effective Communication
      3. Create Templates
      4. Maintain Documentation
      5. Document the Risks
      6. Practice Failure
      7. Understand Your Tools
      8. Clearly Define Roles and Responsibilities
      9. Understand Severity Levels and Escalation Protocols
    4. Responding to Incidents
    5. Learning from the Incident
      1. How Deep Should You Dig?
      2. Aiding Discovery
      3. Documenting Incidents Effectively
      4. Distributing the Information
    6. Next Steps
    7. Wrapping Up
  29. 21. Leading Sustainable Teams
    1. Collective Leadership
    2. Adopt a Whole-Team Approach
      1. Build Resilient On-Call Teams
      2. Update On-Call Processes
    3. Monitor the Team’s Work
      1. Why Monitor the Team?
      2. What Should You Monitor?
      3. Measure Impact on the Team
    4. Support Team Infrastructure with Documentation
    5. Budget a Learning Culture
    6. Adapt to Challenges
    7. Wrapping Up
  30. Conclusion
  31. A. Protocols in Practice
    1. Hypertext Transfer Protocol
      1. QUIC
      2. Domain Name System
  32. B. Resolving Test Failures
    1. Test Failure Type #1: Environment Problems
    2. Test Failure Type #2: Flawed Test Logic
    3. Test Failure Type #3: Changing Assumptions
    4. Test Failure Type #4: Flaky Tests
    5. Test Failure Type #5: Code Defects
  33. Index
  34. About the Author

Product information

  • Title: Modern System Administration
  • Author(s): Jennifer Davis
  • Release date: November 2022
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781492055211