Book description
“There’s an incredible amount of depth and thinking in the practices described here, and it’s impressive to see it all in one place.”
—Win Treese, coauthor of Designing Systems for Internet Commerce
The Practice of Cloud System Administration, Volume 2, focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.
Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics:
Designing and building modern web and distributed systems
- Fundamentals of large system design
- Understand the new software engineering implications of cloud administration
- Make systems that are resilient to failure and grow and scale dynamically
- Implement DevOps principles and cultural changes
- IaaS/PaaS/SaaS and virtual platform selection
Operating and running systems using the latest DevOps/SRE strategies
- Upgrade production systems with zero down-time
- What and how to automate; how to decide what not to automate
- On-call best practices that improve uptime
- Why distributed systems require fundamentally different system administration techniques
- Identify and resolve resiliency problems before they surprise you
Assessing and evaluating your team’s operational effectiveness
- Manage the scientific process of continuous improvement
- A forty-page, pain-free assessment system you can start using today
Table of contents
- About This eBook
- Title Page
- Copyright Page
- Contents at a Glance
- Contents
- Preface
- About the Authors
- Introduction
-
Part I: Design: Building It
- Chapter 1. Designing in a Distributed World
-
Chapter 2. Designing for Operations
-
2.1 Operational Requirements
- 2.1.1 Configuration
- 2.1.2 Startup and Shutdown
- 2.1.3 Queue Draining
- 2.1.4 Software Upgrades
- 2.1.5 Backups and Restores
- 2.1.6 Redundancy
- 2.1.7 Replicated Databases
- 2.1.8 Hot Swaps
- 2.1.9 Toggles for Individual Features
- 2.1.10 Graceful Degradation
- 2.1.11 Access Controls and Rate Limits
- 2.1.12 Data Import Controls
- 2.1.13 Monitoring
- 2.1.14 Auditing
- 2.1.15 Debug Instrumentation
- 2.1.16 Exception Collection
- 2.1.17 Documentation for Operations
- 2.2 Implementing Design for Operations
- 2.3 Improving the Model
- 2.4 Summary
- Exercises
-
2.1 Operational Requirements
- Chapter 3. Selecting a Service Platform
- Chapter 4. Application Architectures
- Chapter 5. Design Patterns for Scaling
- Chapter 6. Design Patterns for Resiliency
-
Part II Operations: Running It
- Chapter 7. Operations in a Distributed World
- Chapter 8. DevOps Culture
- Chapter 9. Service Delivery: The Build Phase
- Chapter 10. Service Delivery: The Deployment Phase
-
Chapter 11. Upgrading Live Services
- 11.1 Taking the Service Down for Upgrading
- 11.2 Rolling Upgrades
- 11.3 Canary
- 11.4 Phased Roll-outs
- 11.5 Proportional Shedding
- 11.6 Blue-Green Deployment
- 11.7 Toggling Features
- 11.8 Live Schema Changes
- 11.9 Live Code Changes
- 11.10 Continuous Deployment
- 11.11 Dealing with Failed Code Pushes
- 11.12 Release Atomicity
- 11.13 Summary
- Exercises
- Chapter 12. Automation
- Chapter 13. Design Documents
- Chapter 14. Oncall
- Chapter 15. Disaster Preparedness
- Chapter 16. Monitoring Fundamentals
- Chapter 17. Monitoring Architecture and Practice
- Chapter 18. Capacity Planning
- Chapter 19. Creating KPIs
- Chapter 20. Operational Excellence
- Epilogue
-
Part III Appendices
-
Appendix A. Assessments
- A.1 Regular Tasks (RT)
- A.2 Emergency Response (ER)
- A.3 Monitoring and Metrics (MM)
- A.4 Capacity Planning (CP)
- A.5 Change Management (CM)
- A.6 New Product Introduction and Removal (NPI/NPR)
- A.7 Service Deployment and Decommissioning (SDD)
- A.8 Performance and Efficiency (PE)
- A.9 Service Delivery: The Build Phase
- A.10 Service Delivery: The Deployment Phase
- A.11 Toil Reduction
- A.12 Disaster Preparedness
- Appendix B. The Origins and Future of Distributed Computing and Clouds
- Appendix C. Scaling Terminology and Concepts
- Appendix D. Templates and Examples
- Appendix E. Recommended Reading
-
Appendix A. Assessments
- Bibliography
- Index
Product information
- Title: Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2
- Author(s):
- Release date: September 2014
- Publisher(s): Addison-Wesley Professional
- ISBN: 9780133478549
You might also like
book
Modern System Administration
Early system administration required in-depth knowledge of a variety of services on individual systems. Now, the …
book
Practice of System and Network Administration, The: DevOps and other Best Practices for Enterprise IT, Volume 1
With 28 new chapters, the third edition of innovates yet again! Revised with thousands of updates …
book
Architecting for Scale, 2nd Edition
Every day, companies struggle to scale critical applications. As traffic volume and data demands increase, these …
book
Building Microservices, 2nd Edition
Distributed systems have become more fine-grained as organizations shift from code-heavy monolithic applications to smaller, self-contained …