book

Platform Engineering

by Camille Fournier, Ian Nowland

October 2024

Intermediate to advanced

324 pages

10h 45m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Foreword
Preface
A Note from CamilleWho This Book Is ForHow to Read This BookO’Reilly Online LearningHow to Contact UsAcknowledgmentsFrom CamilleFrom IanFrom Both of Us
I. The What and Why of Platform Engineering
1. Why Platform Engineering Is Becoming Essential
Defining “Platform” and Other Important TermsThe Over-General SwampHow We Got Stuck in the Over-General SwampChange #1: Explosion of ChoiceChange #2: Higher Operational NeedsResult: Drowning in the SwampHow Platform Engineering Clears the SwampLimiting Primitives While Minimizing OverheadReducing Per-Application GlueCentralizing the Cost of MigrationsAllowing Application Developers to Operate What They DevelopEmpowering Teams to Focus on Building PlatformsWrapping Up
2. The Pillars of Platform Engineering
Taking a Curated Product ApproachDeveloping Software-Based AbstractionsThe Major Abstractions: Platform Service and Its APIsThick ClientsOSS CustomizationsIntegrating Metadata RegistriesServing a Broad Base of Application DevelopersOperating as FoundationsResponsibility for the Full PlatformSupporting the PlatformOperational DisciplineWrapping Up
II. Platform Engineering Practices
3. How and When to Get Started
Fostering Platform Cooperation at Small ScaleCreating the Platform Teams That Replace CooperationAre the Benefits of Centralizing Ownership Worth the Costs?Realize the Collective Dynamic Is GoneFocus on Solving Problems, Not New Technology or ArchitectureBeware of New Engineers Coming from Much Bigger CompaniesBe Slow to Hire Product Managers (and Avoid Project Managers)Bonus Problems for Integration/Shared Services PlatformsTransforming a Traditional Infrastructure OrganizationYour Whole Engineering Culture Has to ChangeIdentify the Most Promising Areas to StartRecognize That You Can’t Just Rub Product Managers on It and Call It a DayChange the Way You Support Your ProductsUpdate Your Interview ProcessUpdate Your Systems of Recognition and RewardDon’t Have Too Many Project ManagersAccept That Your Team Will Spend More Time Talking to Customers and Less Time Writing CodeDo the Necessary RestructuringKeep It Fun!Wrapping Up
4. Building Great Platform Teams
The Risks of Single-Focus Platform TeamsToo Much Systems Focus Too Much Development FocusThe Different Roles of Platform EngineersSoftware EngineersSystems EngineersReliability EngineersSystems SpecialistsHiring and Recognizing Engineers in All RolesAllow Role-Specific TitlesAvoid Creating a New Software Engineer Level MatrixHave, at Most, One Level Matrix for the Systems RolesIf Needed, Create a New Software Engineer Interview ProcessVary the Interview Only Slightly for Systems RolesInterview for Customer EmpathyWhat Makes a Great Platform Engineering Manager?Experience Operating PlatformsExperience on Big, Long-Running ProjectsAttention to DetailOther Roles on a Platform TeamProduct ManagersProduct OwnersProject Managers/Technical Program ManagersDeveloper Advocates, Technical Writers, and Support EngineersCreating a Platform Engineering Team CultureA Platform Split Between a Development and an SRE TeamStrengths and Weaknesses of the Development TeamMerging the Teams and Adding Product ManagementInstilling a Platform Engineering CultureWrapping Up
5. Platform as a Product
Product Culture Focuses on the CustomerCharacteristics of Internal CustomersCollaborating with Internal CustomersEmpathizing with CustomersEscaping the Feature Shop Trap to Serve Customers More BroadlyProduct Discovery and Market AnalysisIdentifying Potential Platform ProductsEvolving Existing Offerings: Smoothing the Edges or Rethinking the ProblemMarket Research: Validating New InvestmentsProduct MetricsSuccessful Product Execution: Creating a Product RoadmapVision: Long TermStrategy: Middle TermGoals and Metrics: This YearMilestones: QuarterlyThe Customer-Facing RoadmapSpecification of FeaturesPractice Makes PerfectProduct Failure ModesUnderestimating the Migration CostOverestimating the Change Budget for UsersOverestimating the Value of New Features When Stability Is PoorHaving Too Many Product Managers for the Size of the Engineering TeamHaving Product Managers Doing the Work That Engineering Managers Should Be DoingWrapping Up
6. Operating Platforms
On-Call PracticesWhy 24x7 On-Call Coverage MattersWhy Merged DevOps?Getting to a Sustainable On-Call LoadSupport PracticesWhy Platform Engineers Should Do Support WorkStage 1: Formalize Support LevelsStage 2: Separate Noncritical Support from On-CallStage 3: Hire a Support SpecialistStage 4: At Scale with an Engineering Support OrganizationOperational Feedback PracticesSLOs and SLAs Are Necessary; Error Budgets Are OptionalChange ManagementSynthetic MonitoringOperational ReviewsWrapping Up

7. Planning and Delivery
Planning Long-Running ProjectsClarifying Goals and Requirements in a Proposal DocumentGoing from Proposal to Action PlanAvoiding the Long SlogBottom-Up Roadmap Planning“Keep the Lights On” WorkMandatesSystem ImprovementsBringing It All TogetherCommunicating Status with Biweekly Wins and ChallengesThe BasicsWhy: What’s the Value?What: Structuring Wins and Challenges UpdatesDon’t Forget the Challenges!Getting Your Team to Write Wins and ChallengesWrapping Up
8. Rearchitecting Platforms
Why Rearchitecting Is Preferred to Building a v2 Different Engineering MindsetsArchitectural Needs Drive Mindset DemandsWhy It Is Hard to Build v2 Platforms, but Possible to RearchitectAddressing Security with ArchitectureGuardrails for RearchitecturesCompatibilityTestingLower EnvironmentsTranches, Slow Rollouts, and Staying a Version BehindPlanning for Rearchitectures Step 1: Think Big on Final Rearchitecture GoalsStep 2: Factor in Migration CostsStep 3: Determine Major 12-Month WinsStep 4: Get Leadership Buy-in, and Be Prepared to WaitWrapping Up
9. Migrations and Sunsetting of Platforms
Migration AntipatternsEngineering Easier MigrationsUse Product Abstractions That Minimize Glue and Limit VariationArchitect for Transparent MigrationsTrack Usage MetadataDevelop Automation to Avoid Clipboards Document On-Ramps and Off-RampsCoordinating Smoother MigrationsScope, Limit, and Prioritize Planned ChangesCommunicate Early and PubliclyPush Through the Final 20%Use Mandates SparinglySunsetting Platforms Deciding When to SunsetCoordinating the SunsettingDon’t Be Afraid to Sunset When It Makes SenseWrapping Up
10. Managing Stakeholder Relationships
Stakeholder Mapping: The Power-Interest GridCommunicating with the Right TransparencyBeware of Oversharing DetailUse Regular 1:1s JudiciouslyTrack Expectations and CommitmentsScale Up with Interlock Meetings and Customer Advisory BoardsIncrease Communication During Rough PatchesFinding Acceptable CompromisesBe Clear About the Business ImpactSometimes Say “Yes, with Compromises”Saying “No” Without Ruining the RelationshipCompromising on Shadow PlatformsMoney Troubles: Cost and Budget ManagementStep 1: Figure Out Who Will Benefit TomorrowStep 2: Group the Work into Teams (Don’t Go Person-by-Person)Step 3: Come with Suggestions of What to Cut and Strong Opinions About What to KeepWrapping Up
III. What Does Success Look Like?
11. Your Platforms Are Aligned
Alignment to PurposeAlign Teams to Purpose with the Right Mix of PeopleAlign Culture to Purpose with Common PracticesAlign Culture to Purpose by Having Teams CollaborateAlignment of Product StrategyFoster Cross-Platform Thinking with Independent Product ManagementFoster Cross-Platform Architecture with Independent Lead ICsSeek Feedback from Comments in Platform-wide Customer SurveysJudiciously Resolve Misalignment with RestructuringAlignment of PlansAlign Only on Larger Projects, Not on Every DetailBe Forthright in Confronting MisalignmentFinal Alignment Comes from Principled LeadershipTying It Together: Getting an Organization to AlignmentWrapping Up
12. Your Platforms Are Trusted
Trust in How You OperateAccelerate Trust by Empowering Experienced LeadersOptimize Growth in Trust by Ordering Use CasesTrust in Your Big InvestmentsSeek Technical Stakeholder Buy-in for Trust of Rearchitectures Seek Executive Sponsorship for Trust of New ProductsMaintain Old Systems to Retain TrustGaining Trust Requires Flexibility on What Is “Right”Trust to Prioritize DeliveryCreate a Culture of VelocityPrioritize Projects to Free Up Team CapacityChallenge Assumptions About Product ScopeTying It Together: The Case of the Overcoupled PlatformWrapping Up
13. Your Platforms Manage Complexity
Managing the Accidental Complexity of Human CoordinationManaging the Complexity of Shadow PlatformsManaging Complexity by Controlling GrowthManaging Complexity Through Product DiscoveryTying It Together: Balancing Internal and External ComplexityBurning Out on OSS OperationsTrying (and Failing) to Change the GameShadow Platforms Force a ResetExecuting on the ResetWrapping Up
14. Your Platforms Are Loved
Love Just WorksLove Can Look Like a HackLove Can Be ObviousTying It Together: Love Makes Your Users AwesomeWrapping Up: What Is Love? Baby Don’t Hurt Me
Concluding Remarks
Index
About the Authors

Content preview from Platform Engineering

Chapter 6. Operating Platforms

Rare things become common at scale.

Jason Cohen¹

No matter how well you build a platform, the systems it depends on are complex, so it will inevitably have operational issues. As useful as the product mindset is for platforms, product-focused teams can underinvest in operations when times are good—they move fast and deliver lots of great features, but pile up operational debt along the way. A successful application team might be able to get away with this, because their contributions to the business’s top line are rewarded with extra headcount, which makes it possible to stay ahead of the debt. But that’s not the situation most platform teams are in.

Platforms create their value through leverage, and one aspect of leverage is efficiency—supporting substantially more scale without needing to hire more people into the platform team. However, as this chapter’s introductory quote suggests, this is in conflict with the fact that systems often run into new problems just because of scale, particularly operationally. This means constant-sized teams supporting scaling platforms can wind up in “operational hell,” where neglected operational problems start having ongoing acute business impact, eroding customer trust. As the system is handling critical load at scale, it can take months to remediate the acute impact and years to address the core issues, and all the while new product features are stalled.

To avoid this, platform teams need to routinely invest ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098153632Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Platform Engineering

by Camille Fournier, Ian Nowland

Chapter 6. Operating Platforms

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.