The Practice of System and Network Administration: Volume 1: DevOps and other Best Practices for Enterprise IT, 3rd Edition

Book description

With 28 new chapters, the third edition of The Practice of System and Network Administration innovates yet again! Revised with thousands of updates and clarifications based on reader feedback, this new edition also incorporates DevOps strategies even for non-DevOps environments.

Whether you use Linux, Unix, or Windows, this new edition describes the essential practices previously handed down only from mentor to protégé. This wonderfully lucid, often funny cornucopia of information introduces beginners to advanced frameworks valuable for their entire career, yet is structured to help even experts through difficult projects.

Other books tell you what commands to type. This book teaches you the cross-platform strategies that are timeless!

  • DevOps techniques: Apply DevOps principles to enterprise IT infrastructure, even in environments without developers

  • Game-changing strategies: New ways to deliver results faster with less stress

  • Fleet management: A comprehensive guide to managing your fleet of desktops, laptops, servers and mobile devices

  • Service management: How to design, launch, upgrade and migrate services

  • Measurable improvement: Assess your operational effectiveness; a forty-page, pain-free assessment system you can start using today to raise the quality of all services

  • Design guides: Best practices for networks, data centers, email, storage, monitoring, backups and more

  • Management skills: Organization design, communication, negotiation, ethics, hiring and firing, and more

Have you ever had any of these problems?

  • Have you been surprised to discover your backup tapes are blank?

  • Ever spent a year launching a new service only to be told the users hate it?

  • Do you have more incoming support requests than you can handle?

  • Do you spend more time fixing problems than building the next awesome thing?

  • Have you suffered from a botched migration of thousands of users to a new service?

  • Does your company rely on a computer that, if it died, cant be rebuilt?

  • Is your network a fragile mess that breaks any time you try to improve it?

  • Is there a periodic hell month that happens twice a year? Twelve times a year?

  • Do you find out about problems when your users call you to complain?

  • Does your corporate Change Review Board terrify you?

  • Does each division of your company have their own broken way of doing things?

  • Do you fear that automation will replace you, or break more than it fixes?

  • Are you underpaid and overworked?

No vague management speak or empty platitudes. This comprehensive guide provides real solutions that prevent these problems and more!

Table of contents

  1. Cover Page
  2. Title Page
  3. Copyright Page
  4. Preface
  5. Acknowledgments
  6. About the Authors
  7. Part I Game-Changing Strategies
    1. 1 Climbing Out of the Hole
      1. 1.1 Organizing WIP
        1. 1.1.1 Ticket Systems
        2. 1.1.2 Kanban
        3. 1.1.3 Tickets and Kanban
      2. 1.2 Eliminating Time Sinkholes
        1. 1.2.1 OS Installation and Configuration
        2. 1.2.2 Software Deployment
      3. 1.3 DevOps
      4. 1.4 DevOps Without Devs
      5. 1.5 Bottlenecks
      6. 1.6 Getting Started
      7. 1.7 Summary
      8. Exercises
    2. 2 The Small Batches Principle
      1. 2.1 The Carpenter Analogy
      2. 2.2 Fixing Hell Month
      3. 2.3 Improving Emergency Failovers
      4. 2.4 Launching Early and Often
      5. 2.5 Summary
      6. Exercises
    3. 3 Pets and Cattle
      1. 3.1 The Pets and Cattle Analogy
      2. 3.2 Scaling
      3. 3.3 Desktops as Cattle
      4. 3.4 Server Hardware as Cattle
      5. 3.5 Pets Store State
      6. 3.6 Isolating State
      7. 3.7 Generic Processes
      8. 3.8 Moving Variations to the End
      9. 3.9 Automation
      10. 3.10 Summary
      11. Exercises
    4. 4 Infrastructure as Code
      1. 4.1 Programmable Infrastructure
      2. 4.2 Tracking Changes
      3. 4.3 Benefits of Infrastructure as Code
      4. 4.4 Principles of Infrastructure as Code
      5. 4.5 Configuration Management Tools
        1. 4.5.1 Declarative Versus Imperative
        2. 4.5.2 Idempotency
        3. 4.5.3 Guards and Statements
      6. 4.6 Example Infrastructure as Code Systems
        1. 4.6.1 Configuring a DNS Client
        2. 4.6.2 A Simple Web Server
        3. 4.6.3 A Complex Web Application
      7. 4.7 Bringing Infrastructure as Code to Your Organization
      8. 4.8 Infrastructure as Code for Enhanced Collaboration
      9. 4.9 Downsides to Infrastructure as Code
      10. 4.10 Automation Myths
      11. 4.11 Summary
      12. Exercises
  8. Part II Workstation Fleet Management
    1. 5 Workstation Architecture
      1. 5.1 Fungibility
      2. 5.2 Hardware
      3. 5.3 Operating System
      4. 5.4 Network Configuration
        1. 5.4.1 Dynamic Configuration
        2. 5.4.2 Hardcoded Configuration
        3. 5.4.3 Hybrid Configuration
        4. 5.4.4 Applicability
      5. 5.5 Accounts and Authorization
      6. 5.6 Data Storage
      7. 5.7 OS Updates
      8. 5.8 Security
        1. 5.8.1 Theft
        2. 5.8.2 Malware
      9. 5.9 Logging
      10. 5.10 Summary
      11. Exercises
    2. 6 Workstation Hardware Strategies
      1. 6.1 Physical Workstations
        1. 6.1.1 Laptop Versus Desktop
        2. 6.1.2 Vendor Selection
        3. 6.1.3 Product Line Selection
      2. 6.2 Virtual Desktop Infrastructure
        1. 6.2.1 Reduced Costs
        2. 6.2.2 Ease of Maintenance
        3. 6.2.3 Persistent or Non-persistent?
      3. 6.3 Bring Your Own Device
        1. 6.3.1 Strategies
        2. 6.3.2 Pros and Cons
        3. 6.3.3 Security
        4. 6.3.4 Additional Costs
        5. 6.3.5 Usability
      4. 6.4 Summary
      5. Exercises
    3. 7 Workstation Software Life Cycle
      1. 7.1 Life of a Machine
      2. 7.2 OS Installation
      3. 7.3 OS Configuration
        1. 7.3.1 Configuration Management Systems
        2. 7.3.2 Microsoft Group Policy Objects
        3. 7.3.3 DHCP Configuration
        4. 7.3.4 Package Installation
      4. 7.4 Updating the System Software and Applications
        1. 7.4.1 Updates Versus Installations
        2. 7.4.2 Update Methods
      5. 7.5 Rolling Out Changes . . . Carefully
      6. 7.6 Disposal
        1. 7.6.1 Accounting
        2. 7.6.2 Technical: Decommissioning
        3. 7.6.3 Technical: Data Security
        4. 7.6.4 Physical
      7. 7.7 Summary
      8. Exercises
    4. 8 OS Installation Strategies
      1. 8.1 Consistency Is More Important Than Perfection
      2. 8.2 Installation Strategies
        1. 8.2.1 Automation
        2. 8.2.2 Cloning
        3. 8.2.3 Manual
      3. 8.3 Test-Driven Configuration Development
      4. 8.4 Automating in Steps
      5. 8.5 When Not to Automate
      6. 8.6 Vendor Support of OS Installation
      7. 8.7 Should You Trust the Vendor’s Installation?
      8. 8.8 Summary
      9. Exercises
    5. 9 Workstation Service Definition
      1. 9.1 Basic Service Definition
        1. 9.1.1 Approaches to Platform Definition
        2. 9.1.2 Application Selection
        3. 9.1.3 Leveraging a CMDB
      2. 9.2 Refresh Cycles
        1. 9.2.1 Choosing an Approach
        2. 9.2.2 Formalizing the Policy
        3. 9.2.3 Aligning with Asset Depreciation
      3. 9.3 Tiered Support Levels
      4. 9.4 Workstations as a Managed Service
      5. 9.5 Summary
      6. Exercises
    6. 10 Workstation Fleet Logistics
      1. 10.1 What Employees See
      2. 10.2 What Employees Don’t See
        1. 10.2.1 Purchasing Team
        2. 10.2.2 Prep Team
        3. 10.2.3 Delivery Team
        4. 10.2.4 Platform Team
        5. 10.2.5 Network Team
        6. 10.2.6 Tools Team
        7. 10.2.7 Project Management
        8. 10.2.8 Program Office
      3. 10.3 Configuration Management Database
      4. 10.4 Small-Scale Fleet Logistics
        1. 10.4.1 Part-Time Fleet Management
        2. 10.4.2 Full-Time Fleet Coordinators
      5. 10.5 Summary
      6. Exercises
    7. 11 Workstation Standardization
      1. 11.1 Involving Customers Early
      2. 11.2 Releasing Early and Iterating
      3. 11.3 Having a Transition Interval (Overlap)
      4. 11.4 Ratcheting
      5. 11.5 Setting a Cut-Off Date
      6. 11.6 Adapting for Your Corporate Culture
      7. 11.7 Leveraging the Path of Least Resistance
      8. 11.8 Summary
      9. Exercises
    8. 12 Onboarding
      1. 12.1 Making a Good First Impression
      2. 12.2 IT Responsibilities
      3. 12.3 Five Keys to Successful Onboarding
        1. 12.3.1 Drive the Process with an Onboarding Timeline
        2. 12.3.2 Determine Needs Ahead of Arrival
        3. 12.3.3 Perform the Onboarding
        4. 12.3.4 Communicate Across Teams
        5. 12.3.5 Reflect On and Improve the Process
      4. 12.4 Cadence Changes
      5. 12.5 Case Studies
        1. 12.5.1 Worst Onboarding Experience Ever
        2. 12.5.2 Lumeta’s Onboarding Process
        3. 12.5.3 Google’s Onboarding Process
      6. 12.6 Summary
      7. Exercises
  9. Part III Servers
    1. 13 Server Hardware Strategies
      1. 13.1 All Eggs in One Basket
      2. 13.2 Beautiful Snowflakes
        1. 13.2.1 Asset Tracking
        2. 13.2.2 Reducing Variations
        3. 13.2.3 Global Optimization
      3. 13.3 Buy in Bulk, Allocate Fractions
        1. 13.3.1 VM Management
        2. 13.3.2 Live Migration
        3. 13.3.3 VM Packing
        4. 13.3.4 Spare Capacity for Maintenance
        5. 13.3.5 Unified VM/Non-VM Management
        6. 13.3.6 Containers
      4. 13.4 Grid Computing
      5. 13.5 Blade Servers
      6. 13.6 Cloud-Based Compute Services
        1. 13.6.1 What Is the Cloud?
        2. 13.6.2 Cloud Computing’s Cost Benefits
        3. 13.6.3 Software as a Service
      7. 13.7 Server Appliances
      8. 13.8 Hybrid Strategies
      9. 13.9 Summary
      10. Exercises
    2. 14 Server Hardware Features
      1. 14.1 Workstations Versus Servers
        1. 14.1.1 Server Hardware Design Differences
        2. 14.1.2 Server OS and Management Differences
      2. 14.2 Server Reliability
        1. 14.2.1 Levels of Redundancy
        2. 14.2.2 Data Integrity
        3. 14.2.3 Hot-Swap Components
        4. 14.2.4 Servers Should Be in Computer Rooms
      3. 14.3 Remotely Managing Servers
        1. 14.3.1 Integrated Out-of-Band Management
        2. 14.3.2 Non-integrated Out-of-Band Management
      4. 14.4 Separate Administrative Networks
      5. 14.5 Maintenance Contracts and Spare Parts
        1. 14.5.1 Vendor SLA
        2. 14.5.2 Spare Parts
        3. 14.5.3 Tracking Service Contracts
        4. 14.5.4 Cross-Shipping
      6. 14.6 Selecting Vendors with Server Experience
      7. 14.7 Summary
      8. Exercises
    3. 15 Server Hardware Specifications
      1. 15.1 Models and Product Lines
      2. 15.2 Server Hardware Details
        1. 15.2.1 CPUs
        2. 15.2.2 Memory
        3. 15.2.3 Network Interfaces
        4. 15.2.4 Disks: Hardware Versus Software RAID
        5. 15.2.5 Power Supplies
      3. 15.3 Things to Leave Out
      4. 15.4 Summary
      5. Exercises
  10. Part IV Services
    1. 16 Service Requirements
      1. 16.1 Services Make the Environment
      2. 16.2 Starting with a Kick-Off Meeting
      3. 16.3 Gathering Written Requirements
      4. 16.4 Customer Requirements
        1. 16.4.1 Describing Features
        2. 16.4.2 Questions to Ask
        3. 16.4.3 Service Level Agreements
        4. 16.4.4 Handling Difficult Requests
      5. 16.5 Scope, Schedule, and Resources
      6. 16.6 Operational Requirements
        1. 16.6.1 System Observability
        2. 16.6.2 Remote and Central Management
        3. 16.6.3 Scaling Up or Out
        4. 16.6.4 Software Upgrades
        5. 16.6.5 Environment Fit
        6. 16.6.6 Support Model
        7. 16.6.7 Service Requests
        8. 16.6.8 Disaster Recovery
      7. 16.7 Open Architecture
      8. 16.8 Summary
      9. Exercises
    2. 17 Service Planning and Engineering
      1. 17.1 General Engineering Basics
      2. 17.2 Simplicity
      3. 17.3 Vendor-Certified Designs
      4. 17.4 Dependency Engineering
        1. 17.4.1 Primary Dependencies
        2. 17.4.2 External Dependencies
        3. 17.4.3 Dependency Alignment
      5. 17.5 Decoupling Hostname from Service Name
      6. 17.6 Support
        1. 17.6.1 Monitoring
        2. 17.6.2 Support Model
        3. 17.6.3 Service Request Model
        4. 17.6.4 Documentation
      7. 17.7 Summary
      8. Exercises
    3. 18 Service Resiliency and Performance Patterns
      1. 18.1 Redundancy Design Patterns
        1. 18.1.1 Primary and Secondary
        2. 18.1.2 Load Balancers Plus Replicas
        3. 18.1.3 Replicas and Shared State
        4. 18.1.4 Performance or Resilience?
      2. 18.2 Performance and Scaling
        1. 18.2.1 Dataflow Analysis for Scaling
        2. 18.2.2 Bandwidth Versus Latency
      3. 18.3 Summary
      4. Exercises
    4. 19 Service Launch: Fundamentals
      1. 19.1 Planning for Problems
      2. 19.2 The Six-Step Launch Process
        1. 19.2.1 Step 1: Define the Ready List
        2. 19.2.2 Step 2: Work the List
        3. 19.2.3 Step 3: Launch the Beta Service
        4. 19.2.4 Step 4: Launch the Production Service
        5. 19.2.5 Step 5: Capture the Lessons Learned
        6. 19.2.6 Step 6: Repeat
      3. 19.3 Launch Readiness Review
        1. 19.3.1 Launch Readiness Criteria
        2. 19.3.2 Sample Launch Criteria
        3. 19.3.3 Organizational Learning
        4. 19.3.4 LRC Maintenance
      4. 19.4 Launch Calendar
      5. 19.5 Common Launch Problems
        1. 19.5.1 Processes Fail in Production
        2. 19.5.2 Unexpected Access Methods
        3. 19.5.3 Production Resources Unavailable
        4. 19.5.4 New Technology Failures
        5. 19.5.5 Lack of User Training
        6. 19.5.6 No Backups
      6. 19.6 Summary
      7. Exercises
    5. 20 Service Launch: DevOps
      1. 20.1 Continuous Integration and Deployment
        1. 20.1.1 Test Ordering
        2. 20.1.2 Launch Categorizations
      2. 20.2 Minimum Viable Product
      3. 20.3 Rapid Release with Packaged Software
        1. 20.3.1 Testing Before Deployment
        2. 20.3.2 Time to Deployment Metrics
      4. 20.4 Cloning the Production Environment
      5. 20.5 Example: DNS/DHCP Infrastructure Software
        1. 20.5.1 The Problem
        2. 20.5.2 Desired End-State
        3. 20.5.3 First Milestone
        4. 20.5.4 Second Milestone
      6. 20.6 Launch with Data Migration
      7. 20.7 Controlling Self-Updating Software
      8. 20.8 Summary
      9. Exercises
    6. 21 Service Conversions
      1. 21.1 Minimizing Intrusiveness
      2. 21.2 Layers Versus Pillars
      3. 21.3 Vendor Support
      4. 21.4 Communication
      5. 21.5 Training
      6. 21.6 Gradual Roll-Outs
      7. 21.7 Flash-Cuts: Doing It All at Once
      8. 21.8 Backout Plan
        1. 21.8.1 Instant Roll-Back
        2. 21.8.2 Decision Point
      9. 21.9 Summary
      10. Exercises
    7. 22 Disaster Recovery and Data Integrity
      1. 22.1 Risk Analysis
      2. 22.2 Legal Obligations
      3. 22.3 Damage Limitation
      4. 22.4 Preparation
      5. 22.5 Data Integrity
      6. 22.6 Redundant Sites
      7. 22.7 Security Disasters
      8. 22.8 Media Relations
      9. 22.9 Summary
      10. Exercises
  11. Part V Infrastructure
    1. 23 Network Architecture
      1. 23.1 Physical Versus Logical
      2. 23.2 The OSI Model
      3. 23.3 Wired Office Networks
        1. 23.3.1 Physical Infrastructure
        2. 23.3.2 Logical Design
        3. 23.3.3 Network Access Control
        4. 23.3.4 Location for Emergency Services
      4. 23.4 Wireless Office Networks
        1. 23.4.1 Physical Infrastructure
        2. 23.4.2 Logical Design
      5. 23.5 Datacenter Networks
        1. 23.5.1 Physical Infrastructure
        2. 23.5.2 Logical Design
      6. 23.6 WAN Strategies
        1. 23.6.1 Topology
        2. 23.6.2 Technology
      7. 23.7 Routing
        1. 23.7.1 Static Routing
        2. 23.7.2 Interior Routing Protocol
        3. 23.7.3 Exterior Gateway Protocol
      8. 23.8 Internet Access
        1. 23.8.1 Outbound Connectivity
        2. 23.8.2 Inbound Connectivity
      9. 23.9 Corporate Standards
        1. 23.9.1 Logical Design
        2. 23.9.2 Physical Design
      10. 23.10 Software-Defined Networks
      11. 23.11 IPv6
        1. 23.11.1 The Need for IPv6
        2. 23.11.2 Deploying IPv6
      12. 23.12 Summary
      13. Exercises
    2. 24 Network Operations
      1. 24.1 Monitoring
      2. 24.2 Management
        1. 24.2.1 Access and Audit Trail
        2. 24.2.2 Life Cycle
        3. 24.2.3 Configuration Management
        4. 24.2.4 Software Versions
        5. 24.2.5 Deployment Process
      3. 24.3 Documentation
        1. 24.3.1 Network Design and Implementation
        2. 24.3.2 DNS
        3. 24.3.3 CMDB
        4. 24.3.4 Labeling
      4. 24.4 Support
        1. 24.4.1 Tools
        2. 24.4.2 Organizational Structure
        3. 24.4.3 Network Services
      5. 24.5 Summary
      6. Exercises
    3. 25 Datacenters Overview
      1. 25.1 Build, Rent, or Outsource
        1. 25.1.1 Building
        2. 25.1.2 Renting
        3. 25.1.3 Outsourcing
        4. 25.1.4 No Datacenter
        5. 25.1.5 Hybrid
      2. 25.2 Requirements
        1. 25.2.1 Business Requirements
        2. 25.2.2 Technical Requirements
      3. 25.3 Summary
      4. Exercises
    4. 26 Running a Datacenter
      1. 26.1 Capacity Management
        1. 26.1.1 Rack Space
        2. 26.1.2 Power
        3. 26.1.3 Wiring
        4. 26.1.4 Network and Console
      2. 26.2 Life-Cycle Management
        1. 26.2.1 Installation
        2. 26.2.2 Moves, Adds, and Changes
        3. 26.2.3 Maintenance
        4. 26.2.4 Decommission
      3. 26.3 Patch Cables
      4. 26.4 Labeling
        1. 26.4.1 Labeling Rack Location
        2. 26.4.2 Labeling Patch Cables
        3. 26.4.3 Labeling Network Equipment
      5. 26.5 Console Access
      6. 26.6 Workbench
      7. 26.7 Tools and Supplies
        1. 26.7.1 Tools
        2. 26.7.2 Spares and Supplies
        3. 26.7.3 Parking Spaces
      8. 26.8 Summary
      9. Exercises
  12. Part VI Helpdesks and Support
    1. 27 Customer Support
      1. 27.1 Having a Helpdesk
      2. 27.2 Offering a Friendly Face
      3. 27.3 Reflecting Corporate Culture
      4. 27.4 Having Enough Staff
      5. 27.5 Defining Scope of Support
      6. 27.6 Specifying How to Get Help
      7. 27.7 Defining Processes for Staff
      8. 27.8 Establishing an Escalation Process
      9. 27.9 Defining “Emergency” in Writing
      10. 27.10 Supplying Request-Tracking Software
      11. 27.11 Statistical Improvements
      12. 27.12 After-Hours and 24/7 Coverage
      13. 27.13 Better Advertising for the Helpdesk
      14. 27.14 Different Helpdesks for Different Needs
      15. 27.15 Summary
      16. Exercises
    2. 28 Handling an Incident Report
      1. 28.1 Process Overview
      2. 28.2 Phase A—Step 1: The Greeting
      3. 28.3 Phase B: Problem Identification
        1. 28.3.1 Step 2: Problem Classification
        2. 28.3.2 Step 3: Problem Statement
        3. 28.3.3 Step 4: Problem Verification
      4. 28.4 Phase C: Planning and Execution
        1. 28.4.1 Step 5: Solution Proposals
        2. 28.4.2 Step 6: Solution Selection
        3. 28.4.3 Step 7: Execution
      5. 28.5 Phase D: Verification
        1. 28.5.1 Step 8: Craft Verification
        2. 28.5.2 Step 9: Customer Verification/Closing
      6. 28.6 Perils of Skipping a Step
      7. 28.7 Optimizing Customer Care
        1. 28.7.1 Model-Based Training
        2. 28.7.2 Holistic Improvement
        3. 28.7.3 Increased Customer Familiarity
        4. 28.7.4 Special Announcements for Major Outages
        5. 28.7.5 Trend Analysis
        6. 28.7.6 Customers Who Know the Process
        7. 28.7.7 An Architecture That Reflects the Process
      8. 28.8 Summary
      9. Exercises
    3. 29 Debugging
      1. 29.1 Understanding the Customer’s Problem
      2. 29.2 Fixing the Cause, Not the Symptom
      3. 29.3 Being Systematic
      4. 29.4 Having the Right Tools
        1. 29.4.1 Training Is the Most Important Tool
        2. 29.4.2 Understanding the Underlying Technology
        3. 29.4.3 Choosing the Right Tools
        4. 29.4.4 Evaluating Tools
      5. 29.5 End-to-End Understanding of the System
      6. 29.6 Summary
      7. Exercises
    4. 30 Fixing Things Once
      1. 30.1 Story: The Misconfigured Servers
      2. 30.2 Avoiding Temporary Fixes
      3. 30.3 Learn from Carpenters
      4. 30.4 Automation
      5. 30.5 Summary
      6. Exercises
    5. 31 Documentation
      1. 31.1 What to Document
      2. 31.2 A Simple Template for Getting Started
      3. 31.3 Easy Sources for Documentation
        1. 31.3.1 Saving Screenshots
        2. 31.3.2 Capturing the Command Line
        3. 31.3.3 Leveraging Email
        4. 31.3.4 Mining the Ticket System
      4. 31.4 The Power of Checklists
      5. 31.5 Wiki Systems
      6. 31.6 Findability
      7. 31.7 Roll-Out Issues
      8. 31.8 A Content-Management System
      9. 31.9 A Culture of Respect
      10. 31.10 Taxonomy and Structure
      11. 31.11 Additional Documentation Uses
      12. 31.12 Off-Site Links
      13. 31.13 Summary
      14. Exercises
  13. Part VII Change Processes
    1. 32 Change Management
      1. 32.1 Change Review Boards
      2. 32.2 Process Overview
      3. 32.3 Change Proposals
      4. 32.4 Change Classifications
      5. 32.5 Risk Discovery and Quantification
      6. 32.6 Technical Planning
      7. 32.7 Scheduling
      8. 32.8 Communication
      9. 32.9 Tiered Change Review Boards
      10. 32.10 Change Freezes
      11. 32.11 Team Change Management
        1. 32.11.1 Changes Before Weekends
        2. 32.11.2 Preventing Injured Toes
        3. 32.11.3 Revision History
      12. 32.12 Starting with Git
      13. 32.13 Summary
      14. Exercises
    2. 33 Server Upgrades
      1. 33.1 The Upgrade Process
      2. 33.2 Step 1: Develop a Service Checklist
      3. 33.3 Step 2: Verify Software Compatibility
        1. 33.3.1 Upgrade the Software Before the OS
        2. 33.3.2 Upgrade the Software After the OS
        3. 33.3.3 Postpone the Upgrade or Change the Software
      4. 33.4 Step 3: Develop Verification Tests
      5. 33.5 Step 4: Choose an Upgrade Strategy
        1. 33.5.1 Speed
        2. 33.5.2 Risk
        3. 33.5.3 End-User Disruption
        4. 33.5.4 Effort
      6. 33.6 Step 5: Write a Detailed Implementation Plan
        1. 33.6.1 Adding Services During the Upgrade
        2. 33.6.2 Removing Services During the Upgrade
        3. 33.6.3 Old and New Versions on the Same Machine
        4. 33.6.4 Performing a Dress Rehearsal
      7. 33.7 Step 6: Write a Backout Plan
      8. 33.8 Step 7: Select a Maintenance Window
      9. 33.9 Step 8: Announce the Upgrade
      10. 33.10 Step 9: Execute the Tests
      11. 33.11 Step 10: Lock Out Customers
      12. 33.12 Step 11: Do the Upgrade with Someone
      13. 33.13 Step 12: Test Your Work
      14. 33.14 Step 13: If All Else Fails, Back Out
      15. 33.15 Step 14: Restore Access to Customers
      16. 33.16 Step 15: Communicate Completion/Backout
      17. 33.17 Summary
      18. Exercises
    3. 34 Maintenance Windows
      1. 34.1 Process Overview
      2. 34.2 Getting Management Buy-In
      3. 34.3 Scheduling Maintenance Windows
      4. 34.4 Planning Maintenance Tasks
      5. 34.5 Selecting a Flight Director
      6. 34.6 Managing Change Proposals
        1. 34.6.1 Sample Change Proposal: SecurID Server Upgrade
        2. 34.6.2 Sample Change Proposal: Storage Migration
      7. 34.7 Developing the Master Plan
      8. 34.8 Disabling Access
      9. 34.9 Ensuring Mechanics and Coordination
        1. 34.9.1 Shutdown/Boot Sequence
        2. 34.9.2 KVM, Console Service, and LOM
        3. 34.9.3 Communications
      10. 34.10 Change Completion Deadlines
      11. 34.11 Comprehensive System Testing
      12. 34.12 Post-maintenance Communication
      13. 34.13 Reenabling Remote Access
      14. 34.14 Be Visible the Next Morning
      15. 34.15 Postmortem
      16. 34.16 Mentoring a New Flight Director
      17. 34.17 Trending of Historical Data
      18. 34.18 Providing Limited Availability
      19. 34.19 High-Availability Sites
        1. 34.19.1 The Similarities
        2. 34.19.2 The Differences
      20. 34.20 Summary
      21. Exercises
    4. 35 Centralization Overview
      1. 35.1 Rationale for Reorganizing
        1. 35.1.1 Rationale for Centralization
        2. 35.1.2 Rationale for Decentralization
      2. 35.2 Approaches and Hybrids
      3. 35.3 Summary
      4. Exercises
    5. 36 Centralization Recommendations
      1. 36.1 Architecture
      2. 36.2 Security
        1. 36.2.1 Authorization
        2. 36.2.2 Extranet Connections
        3. 36.2.3 Data Leakage Prevention
      3. 36.3 Infrastructure
        1. 36.3.1 Datacenters
        2. 36.3.2 Networking
        3. 36.3.3 IP Address Space Management
        4. 36.3.4 Namespace Management
        5. 36.3.5 Communications
        6. 36.3.6 Data Management
        7. 36.3.7 Monitoring
        8. 36.3.8 Logging
      4. 36.4 Support
        1. 36.4.1 Helpdesk
        2. 36.4.2 End-User Support
      5. 36.5 Purchasing
      6. 36.6 Lab Environments
      7. 36.7 Summary
      8. Exercises
    6. 37 Centralizing a Service
      1. 37.1 Understand the Current Solution
      2. 37.2 Make a Detailed Plan
      3. 37.3 Get Management Support
      4. 37.4 Fix the Problems
      5. 37.5 Provide an Excellent Service
      6. 37.6 Start Slowly
      7. 37.7 Look for Low-Hanging Fruit
      8. 37.8 When to Decentralize
      9. 37.9 Managing Decentralized Services
      10. 37.10 Summary
      11. Exercises
  14. Part VIII Service Recommendations
    1. 38 Service Monitoring
      1. 38.1 Types of Monitoring
      2. 38.2 Building a Monitoring System
      3. 38.3 Historical Monitoring
        1. 38.3.1 Gathering the Data
        2. 38.3.2 Storing the Data
        3. 38.3.3 Viewing the Data
      4. 38.4 Real-Time Monitoring
        1. 38.4.1 SNMP
        2. 38.4.2 Log Processing
        3. 38.4.3 Alerting Mechanism
        4. 38.4.4 Escalation
        5. 38.4.5 Active Monitoring Systems
      5. 38.5 Scaling
        1. 38.5.1 Prioritization
        2. 38.5.2 Cascading Alerts
        3. 38.5.3 Coordination
      6. 38.6 Centralization and Accessibility
      7. 38.7 Pervasive Monitoring
      8. 38.8 End-to-End Tests
      9. 38.9 Application Response Time Monitoring
      10. 38.10 Compliance Monitoring
      11. 38.11 Meta-monitoring
      12. 38.12 Summary
      13. Exercises
    2. 39 Namespaces
      1. 39.1 What Is a Namespace?
      2. 39.2 Basic Rules of Namespaces
      3. 39.3 Defining Names
      4. 39.4 Merging Namespaces
      5. 39.5 Life-Cycle Management
      6. 39.6 Reuse
      7. 39.7 Usage
        1. 39.7.1 Scope
        2. 39.7.2 Consistency
        3. 39.7.3 Authority
      8. 39.8 Federated Identity
      9. 39.9 Summary
      10. Exercises
    3. 40 Nameservices
      1. 40.1 Nameservice Data
        1. 40.1.1 Data
        2. 40.1.2 Consistency
        3. 40.1.3 Authority
        4. 40.1.4 Capacity and Scaling
      2. 40.2 Reliability
        1. 40.2.1 DNS
        2. 40.2.2 DHCP
        3. 40.2.3 LDAP
        4. 40.2.4 Authentication
        5. 40.2.5 Authentication, Authorization, and Accounting
        6. 40.2.6 Databases
      3. 40.3 Access Policy
      4. 40.4 Change Policies
      5. 40.5 Change Procedures
        1. 40.5.1 Automation
        2. 40.5.2 Self-Service Automation
      6. 40.6 Centralized Management
      7. 40.7 Summary
      8. Exercises
    4. 41 Email Service
      1. 41.1 Privacy Policy
      2. 41.2 Namespaces
      3. 41.3 Reliability
      4. 41.4 Simplicity
      5. 41.5 Spam and Virus Blocking
      6. 41.6 Generality
      7. 41.7 Automation
      8. 41.8 Monitoring
      9. 41.9 Redundancy
      10. 41.10 Scaling
      11. 41.11 Security Issues
      12. 41.12 Encryption
      13. 41.13 Email Retention Policy
      14. 41.14 Communication
      15. 41.15 High-Volume List Processing
      16. 41.16 Summary
      17. Exercises
    5. 42 Print Service
      1. 42.1 Level of Centralization
      2. 42.2 Print Architecture Policy
      3. 42.3 Documentation
      4. 42.4 Monitoring
      5. 42.5 Environmental Issues
      6. 42.6 Shredding
      7. 42.7 Summary
      8. Exercises
    6. 43 Data Storage
      1. 43.1 Terminology
        1. 43.1.1 Key Individual Disk Components
        2. 43.1.2 RAID
        3. 43.1.3 Volumes and File Systems
        4. 43.1.4 Directly Attached Storage
        5. 43.1.5 Network-Attached Storage
        6. 43.1.6 Storage-Area Networks
      2. 43.2 Managing Storage
        1. 43.2.1 Reframing Storage as a Community Resource
        2. 43.2.2 Conducting a Storage-Needs Assessment
        3. 43.2.3 Mapping Groups onto Storage Infrastructure
        4. 43.2.4 Developing an Inventory and Spares Policy
        5. 43.2.5 Planning for Future Storage
        6. 43.2.6 Establishing Storage Standards
      3. 43.3 Storage as a Service
        1. 43.3.1 A Storage SLA
        2. 43.3.2 Reliability
        3. 43.3.3 Backups
        4. 43.3.4 Monitoring
        5. 43.3.5 SAN Caveats
      4. 43.4 Performance
        1. 43.4.1 RAID and Performance
        2. 43.4.2 NAS and Performance
        3. 43.4.3 SSDs and Performance
        4. 43.4.4 SANs and Performance
        5. 43.4.5 Pipeline Optimization
      5. 43.5 Evaluating New Storage Solutions
        1. 43.5.1 Drive Speed
        2. 43.5.2 Fragmentation
        3. 43.5.3 Storage Limits: Disk Access Density Gap
        4. 43.5.4 Continuous Data Protection
      6. 43.6 Common Data Storage Problems
        1. 43.6.1 Large Physical Infrastructure
        2. 43.6.2 Timeouts
        3. 43.6.3 Saturation Behavior
      7. 43.7 Summary
      8. Exercises
    7. 44 Backup and Restore
      1. 44.1 Getting Started
      2. 44.2 Reasons for Restores
        1. 44.2.1 Accidental File Deletion
        2. 44.2.2 Disk Failure
        3. 44.2.3 Archival Purposes
        4. 44.2.4 Perform Fire Drills
      3. 44.3 Corporate Guidelines
      4. 44.4 A Data-Recovery SLA and Policy
      5. 44.5 The Backup Schedule
      6. 44.6 Time and Capacity Planning
        1. 44.6.1 Backup Speed
        2. 44.6.2 Restore Speed
        3. 44.6.3 High-Availability Databases
      7. 44.7 Consumables Planning
        1. 44.7.1 Tape Inventory
        2. 44.7.2 Backup Media and Off-Site Storage
      8. 44.8 Restore-Process Issues
      9. 44.9 Backup Automation
      10. 44.10 Centralization
      11. 44.11 Technology Changes
      12. 44.12 Summary
      13. Exercises
    8. 45 Software Repositories
      1. 45.1 Types of Repositories
      2. 45.2 Benefits of Repositories
      3. 45.3 Package Management Systems
      4. 45.4 Anatomy of a Package
        1. 45.4.1 Metadata and Scripts
        2. 45.4.2 Active Versus Dormant Installation
        3. 45.4.3 Binary Packages
        4. 45.4.4 Library Packages
        5. 45.4.5 Super-Packages
        6. 45.4.6 Source Packages
      5. 45.5 Anatomy of a Repository
        1. 45.5.1 Security
        2. 45.5.2 Universal Access
        3. 45.5.3 Release Process
        4. 45.5.4 Multitiered Mirrors and Caches
      6. 45.6 Managing a Repository
        1. 45.6.1 Repackaging Public Packages
        2. 45.6.2 Repackaging Third-Party Software
        3. 45.6.3 Service and Support
        4. 45.6.4 Repository as a Service
      7. 45.7 Repository Client
        1. 45.7.1 Version Management
        2. 45.7.2 Tracking Conflicts
      8. 45.8 Build Environment
        1. 45.8.1 Continuous Integration
        2. 45.8.2 Hermetic Build
      9. 45.9 Repository Examples
        1. 45.9.1 Staged Software Repository
        2. 45.9.2 OS Mirror
        3. 45.9.3 Controlled OS Mirror
      10. 45.10 Summary
      11. Exercises
    9. 46 Web Services
      1. 46.1 Simple Web Servers
      2. 46.2 Multiple Web Servers on One Host
        1. 46.2.1 Scalable Techniques
        2. 46.2.2 HTTPS
      3. 46.3 Service Level Agreements
      4. 46.4 Monitoring
      5. 46.5 Scaling for Web Services
        1. 46.5.1 Horizontal Scaling
        2. 46.5.2 Vertical Scaling
        3. 46.5.3 Choosing a Scaling Method
      6. 46.6 Web Service Security
        1. 46.6.1 Secure Connections and Certificates
        2. 46.6.2 Protecting the Web Server Application
        3. 46.6.3 Protecting the Content
        4. 46.6.4 Application Security
      7. 46.7 Content Management
      8. 46.8 Summary
      9. Exercises
  15. Part IX Management Practices
    1. 47 Ethics
      1. 47.1 Informed Consent
      2. 47.2 Code of Ethics
      3. 47.3 Customer Usage Guidelines
      4. 47.4 Privileged-Access Code of Conduct
      5. 47.5 Copyright Adherence
      6. 47.6 Working with Law Enforcement
      7. 47.7 Setting Expectations on Privacy and Monitoring
      8. 47.8 Being Told to Do Something Illegal/Unethical
      9. 47.9 Observing Illegal Activity
      10. 47.10 Summary
      11. Exercises
    2. 48 Organizational Structures
      1. 48.1 Sizing
      2. 48.2 Funding Models
      3. 48.3 Management Chain’s Influence
      4. 48.4 Skill Selection
      5. 48.5 Infrastructure Teams
      6. 48.6 Customer Support
      7. 48.7 Helpdesk
      8. 48.8 Outsourcing
      9. 48.9 Consultants and Contractors
      10. 48.10 Sample Organizational Structures
        1. 48.10.1 Small Company
        2. 48.10.2 Medium-Size Company
        3. 48.10.3 Large Company
        4. 48.10.4 E-commerce Site
        5. 48.10.5 Universities and Nonprofit Organizations
      11. 48.11 Summary
      12. Exercises
    3. 49 Perception and Visibility
      1. 49.1 Perception
        1. 49.1.1 A Good First Impression
        2. 49.1.2 Attitude, Perception, and Customers
        3. 49.1.3 Aligning Priorities with Customer Expectations
        4. 49.1.4 The System Advocate
      2. 49.2 Visibility
        1. 49.2.1 System Status Web Page
        2. 49.2.2 Management Meetings
        3. 49.2.3 Physical Visibility
        4. 49.2.4 Town Hall Meetings
        5. 49.2.5 Newsletters
        6. 49.2.6 Mail to All Customers
        7. 49.2.7 Lunch
      3. 49.3 Summary
      4. Exercises
    4. 50 Time Management
      1. 50.1 Interruptions
        1. 50.1.1 Stay Focused
        2. 50.1.2 Splitting Your Day
      2. 50.2 Follow-Through
      3. 50.3 Basic To-Do List Management
      4. 50.4 Setting Goals
      5. 50.5 Handling Email Once
      6. 50.6 Precompiling Decisions
      7. 50.7 Finding Free Time
      8. 50.8 Dealing with Ineffective People
      9. 50.9 Dealing with Slow Bureaucrats
      10. 50.10 Summary
      11. Exercises
    5. 51 Communication and Negotiation
      1. 51.1 Communication
      2. 51.2 I Statements
      3. 51.3 Active Listening
        1. 51.3.1 Mirroring
        2. 51.3.2 Summary Statements
        3. 51.3.3 Reflection
      4. 51.4 Negotiation
        1. 51.4.1 Recognizing the Situation
        2. 51.4.2 Format of a Negotiation Meeting
        3. 51.4.3 Working Toward a Win-Win Outcome
        4. 51.4.4 Planning Your Negotiations
      5. 51.5 Additional Negotiation Tips
        1. 51.5.1 Ask for What You Want
        2. 51.5.2 Don’t Negotiate Against Yourself
        3. 51.5.3 Don’t Reveal Your Strategy
        4. 51.5.4 Refuse the First Offer
        5. 51.5.5 Use Silence as a Negotiating Tool
      6. 51.6 Further Reading
      7. 51.7 Summary
      8. Exercises
    6. 52 Being a Happy SA
      1. 52.1 Happiness
      2. 52.2 Accepting Criticism
      3. 52.3 Your Support Structure
      4. 52.4 Balancing Work and Personal Life
      5. 52.5 Professional Development
      6. 52.6 Staying Technical
      7. 52.7 Loving Your Job
      8. 52.8 Motivation
      9. 52.9 Managing Your Manager
      10. 52.10 Self-Help Books
      11. 52.11 Summary
      12. Exercises
    7. 53 Hiring System Administrators
      1. 53.1 Job Description
      2. 53.2 Skill Level
      3. 53.3 Recruiting
      4. 53.4 Timing
      5. 53.5 Team Considerations
      6. 53.6 The Interview Team
      7. 53.7 Interview Process
      8. 53.8 Technical Interviewing
      9. 53.9 Nontechnical Interviewing
      10. 53.10 Selling the Position
      11. 53.11 Employee Retention
      12. 53.12 Getting Noticed
      13. 53.13 Summary
      14. Exercises
    8. 54 Firing System Administrators
      1. 54.1 Cooperate with Corporate HR
      2. 54.2 The Exit Checklist
      3. 54.3 Removing Access
        1. 54.3.1 Physical Access
        2. 54.3.2 Remote Access
        3. 54.3.3 Application Access
        4. 54.3.4 Shared Passwords
        5. 54.3.5 External Services
        6. 54.3.6 Certificates and Other Secrets
      4. 54.4 Logistics
      5. 54.5 Examples
        1. 54.5.1 Amicably Leaving a Company
        2. 54.5.2 Firing the Boss
        3. 54.5.3 Removal at an Academic Institution
      6. 54.6 Supporting Infrastructure
      7. 54.7 Summary
      8. Exercises
  16. Part X Being More Awesome
    1. 55 Operational Excellence
      1. 55.1 What Does Operational Excellence Look Like?
      2. 55.2 How to Measure Greatness
      3. 55.3 Assessment Methodology
        1. 55.3.1 Operational Responsibilities
        2. 55.3.2 Assessment Levels
        3. 55.3.3 Assessment Questions and Look-For’s
      4. 55.4 Service Assessments
        1. 55.4.1 Identifying What to Assess
        2. 55.4.2 Assessing Each Service
        3. 55.4.3 Comparing Results Across Services
        4. 55.4.4 Acting on the Results
        5. 55.4.5 Assessment and Project Planning Frequencies
      5. 55.5 Organizational Assessments
      6. 55.6 Levels of Improvement
      7. 55.7 Getting Started
      8. 55.8 Summary
      9. Exercises
    2. 56 Operational Assessments
      1. 56.1 Regular Tasks (RT)
      2. 56.2 Emergency Response (ER)
      3. 56.3 Monitoring and Metrics (MM)
      4. 56.4 Capacity Planning (CP)
      5. 56.5 Change Management (CM)
      6. 56.6 New Product Introduction and Removal (NPI/NPR)
      7. 56.7 Service Deployment and Decommissioning (SDD)
      8. 56.8 Performance and Efficiency (PE)
      9. 56.9 Service Delivery: The Build Phase
      10. 56.10 Service Delivery: The Deployment Phase
      11. 56.11 Toil Reduction
      12. 56.12 Disaster Preparedness
    3. Epilogue
  17. Part XI Appendices
    1. A What to Do When . . .
    2. B The Many Roles of a System Administrator
      1. B.1 Common Positive Roles
      2. B.2 Negative Roles
      3. B.3 Team Roles
      4. B.4 Summary
      5. Exercises
  18. Bibliography
  19. Index

Product information

  • Title: The Practice of System and Network Administration: Volume 1: DevOps and other Best Practices for Enterprise IT, 3rd Edition
  • Author(s): Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan
  • Release date: November 2016
  • Publisher(s): Addison-Wesley Professional
  • ISBN: 9780133415087