book

Enabling Microservice Success

Name: Enabling Microservice Success
Author: Sarah Wells
ISBN: 9781098130794

by Sarah Wells

March 2024

Intermediate to advanced

450 pages

12h 48m

English

O'Reilly Media, Inc.

Audio summary available

Read now

Unlock full access

Includes

Quizzes

Foreword
Preface
Why I Wrote This BookWho Should Read This BookNavigating This BookPart I: ContextPart II: Organizational Structure and CulturePart III: Building and OperatingAppendixesCase StudiesConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgments
I. Context
1. Understanding Microservices
Defining the Microservices Architectural StyleA Suite of ServicesEach Running in Its Own ProcessCommunicating with Lightweight MechanismsBuilt Around Business CapabilitiesIndependently Deployable“Small”With a Bare Minimum of Centralized ManagementHeterogeneousForerunners and AlternativesThe MonolithModular MonolithsService-Oriented ArchitectureThe Microservices EcosystemInfrastructure as CodeContinuous DeliveryThe Public CloudNew Deployment OptionsDevOpsObservabilityAdvantages of MicroservicesIndependently ScalableRobustEasy to Release Small Changes FrequentlySupport Flexible Technology ChoicesChallenges of MicroservicesLatencyEstate ComplexityOperational ComplexityData ConsistencySecurityFinding the Right Level of GranularityHandling ChangeRequire Organizational ChangeChange the Developer ExperienceIn Summary
2. Effective Software Delivery
Regularly Delivering Business ValueHigh Deployment FrequencyShort Lead Time for ChangesRunning ExperimentsSeparating Deploying Code from Releasing FunctionalityHandling Work That Goes Across Team BoundariesAdapting to Changing PrioritiesMaintaining Appropriate Service LevelsWhen a Release Goes WrongKnowing When Something Important Is BrokenRestore Some Level of Service QuicklyAvoid Failure CascadesSpending Most of Your Time on Meaningful WorkNot Having to Start AgainKeeping Risk at an Acceptable LevelHow Microservices Measure UpIn Summary
3. Are Microservices Right for You?
Reasons to Choose MicroservicesScaling the OrganizationDeveloper ExperienceSeparating Out Areas with Compliance and Security RequirementsScaling for LoadIncreasing RobustnessIncreasing FlexibilityConditions for SuccessDomain UnderstandingProducts Not ProjectsLeadership SupportTeams That Want AutonomyProcesses That Enable AutonomyTechnical MaturityManaging ChangeSticking with a Monolithic ArchitectureEnable Zero-Downtime DeploymentsBuild a Modular MonolithEverything Is Distributed NowThe Rise of Cloud NativeSaaS Makes SenseRecommendationsStarting from ScratchReplacing an Existing MonolithMeasuring SuccessIn Summary
II. Organizational Structure and Culture
4. Conway’s Law and Finding the Right Boundaries
Conway’s LawThe Inverse Conway ManeuverPossible BoundariesBusiness DomainsLocationsTechnologiesComplianceTolerance for FailureFrequency of ChangesRecommendationsIdentifying When Boundaries Are WrongIn Summary
5. Building Effective Teams
Organizational CultureOpenLearningEmpoweringOptimized for ChangeThe Westrum ModelEffective TeamsMotivated through Autonomy, Mastery, and PurposeAligned to Business DomainAppropriately SizedCross-Functional and T-shapedStrong OwnershipLong LivedSustainable Cognitive LoadHigh Trust and High Psychological SafetyPart of a GroupOptimizing for FlowStream-AlignedEnablingComplicated SubsystemPlatformIn Summary
6. Enabling Autonomy
What Is Autonomy?Why Does Autonomy Matter?Limits to AutonomyThe Right Amount of CommunicationInteraction StylesCollaborationX-as-a-ServiceFacilitatingWays of Working That Support AutonomyAligning on OutcomesLight Touch GovernanceTrust but VerifyAgreeing and Aligning on TechnologyThe Role of the Individual ContributorMinimum Viable CompetenciesMaking Space for LearningResponsibilities of Autonomous TeamsActive OwnershipCommunication and CooperationCompliance with StandardsMaintaining a Team PageIn Summary

7. Engineering Enablement and Paving the Road
What’s in a Name?Building a PlatformPlatform ServicesOrganization-Level ConcernsBuilding the Thinnest Viable PlatformBuild for the Needs of the MajorityPlatform as a ProductBeyond the PlatformVendor EngineeringAPIs, Templates, Libraries, and ExamplesA Service CatalogInsightsPaving the RoadWhat Capabilities to IncludeMake It OptionalKeep It SmallHow to Go Off RoadBringing the Treasure BackInternal Developer PortalsBuilding a Platform People Actually UseMaking Sure What You Build Meets a NeedMarket ItLook for Signs You Are Getting It WrongPrinciples for Building a Paved RoadOptionalProvides ValueSelf-ServiceOwned and SupportedEasy to UseGuides People to Do the Right ThingComposable and ExtendableMeasuring ImpactWhen to Invest in Engineering EnablementIn Summary
8. Ensuring “You Build It, You Run It”
Why Microservices Implies DevOpsRelease on DemandWork on Operational FeaturesBuilding Things DifferentlyGood RunbooksRunning on Someone Else’s ServersGetting Comfortable in ProductionSupporting Your System in ProductionAssign Dedicated In-Hours Ops SupportImprove Alerts and DocumentationIdentify the Haunted ForestsPracticeOut-of-Hours SupportAllow People to Opt OutFormal Rotas Versus Best EndeavorsMake Sure Calls Are RareOnly for Critical SystemsProvide Support and GuidanceIncident ManagementBlameless CultureRaising an IncidentRoles to AssignDuring the IncidentAfter the IncidentLearning from IncidentsIn Summary
III. Building and Operating
9. Active Service Ownership
Responding to the Log4Shell VulnerabilityA Counter Example: Equifax and a Struts VulnerabilityOwnership During Active DevelopmentStrong OwnershipWeak OwnershipCollective OwnershipOnce a Service Is Feature CompleteNo OwnershipNominal OwnershipActive OwnershipWhat Active Ownership MeansCode StewardshipUpgrades and PatchingMigrationsProduction SupportDocumentationKnowing Your EstateYour Own SoftwareDependenciesThird-Party SoftwareWhat You Need from a Service CatalogGraph-Based ModelAPI-DrivenExtensibleFlexible SchemaProvides Different Views Across the EstateTransferring OwnershipWhat Does a Good Transfer Look Like?Meeting Quality ExpectationsOperational HandoverReplacingWhat to Do If You’re StrugglingMake the Business CaseStart with Critical SystemsMake Your Best Guess at OwnersDeliver Value from the DataAim for Continuous ImprovementLook for Teams That Are OverwhelmedServices Shouldn’t Live ForeverIn Summary
10. Getting Value from Testing
Why Do We Test?Building the Thing RightBuilding the Right ThingPicking Up RegressionsMeeting Quality-of-Service RequirementsShifting Testing LeftWhat Makes a Good Test?Fast and Early FeedbackEasy to ChangeFinds Real ProblemsTypes of TestingThe Testing PyramidUnit TestsService TestsEnd-to-End TestsContract TestsConsistency TestsExploratory TestsCross-Functional TestingTesting in ProductionIs It Safe?Staging Is Not Production-LikeYour Customers Can Surprise YouYou Can’t Test for Every VariationYou Don’t Have to Roll a Change Out to EveryoneMonitoring as TestingTesting Your InfrastructureChaos EngineeringTesting Failovers and RestoresQuality Is About More Than TestingWhat to Do If You’re StrugglingNot Enough Automated TestingTests That Aren’t Providing ValueIn Summary
11. Governance and Standardization: Finding the Balance
Why Governance MattersKnow Your EstateWhat Sort of Information Is Relevant?Guardrails and PoliciesAutomating GuardrailsWhat to IncludeThe FT’s GuardrailsAligning on GuardrailsTech Governance GroupBenefits of the TGGChoosing TechnologiesThe Technology LifecycleSave Innovation for Key Business OutcomesUse Boring TechnologyLimit the AlternativesBe Clear on Where Duplication Is AcceptableExpect Things to ChangeInsight Leads to ActionGovernance in Other OrganizationsGovernance at MonzoGovernance at SkyscannerWhat to Do If You’re StrugglingIn Summary
12. Building Resilience In
What Is Resilience?Resilience for Distributed SystemsResilience for MicroservicesUnderstanding Your Service Level RequirementsService Level ObjectivesError BudgetsBuilding Resilient ServicesRedundancyFast Startup and Graceful ShutdownSet Appropriate TimeoutsBack Off and RetryMake Your Requests IdempotentProtect YourselfTesting Service ResilienceMake Building Resilient Services EasyBuilding Resilient SystemsCachingHandling Cascading FailuresFallback BehaviorAvoiding Unnecessary WorkGo AsynchronousFailoverBackup and RestoreDisaster RecoveryBuilding a Resilient PlatformResilience to External IssuesInternal ToolingValidating Your Resilience ChoicesChaos EngineeringTesting Backup and RestorePractice Makes PerfectLoad TestingLearn from IncidentsOne Thing at a TimeWhat to Do If You’re StrugglingIn Summary
13. Running Your System in Production
Operational Challenges of MicroservicesDifferent Technologies Mean Different Support Knowledge Is NeededEphemeral InfrastructureRapid ChangeAlert OverloadComplex Systems Run in Degraded ModeBuilding Observability InLoggingMonitoring and MetricsLog AggregationOpenTelemetryFocus on EventsDistributed TracingArchiving Observability DataBuilding Your Own ToolsSpotting IssuesGetting Alerting RightHealthchecksMonitoring Business OutcomesUnderstanding What Normal Looks LikeMitigationTroubleshootingMaintaining Useful DocumentationKnowing What’s ChangedProblems with External SystemsTooling CharacteristicsLearning from IncidentsWhat to Do If You’re StrugglingIn Summary
14. Keeping Things Up-to-Date
Why Is This a Challenge?Minimizing the Impact of ChangeThink About the Long TermA Reason to Be on the Paved RoadChoose Managed Services and SaaS OptionsProvide APIsImmutable and Ephemeral InfrastructureDecommission and DeprecateTypes of ChangeEmergency ChangesMinor Planned ChangesMajor Planned ChangesResponding to ChangeUnderstand the LandscapeDefine Guiding PoliciesMaking a DecisionWho Gets to Decide?Scheduling WorkManaging ChangeClarityCommunicationEmpathyExecutionWhat to Do If You’re StrugglingIn Summary
Afterword
Why Microservices?The Importance of FlowSupport for AutonomyThe Rise of Platform EngineeringWrapping Up
A. Microservices Assessment
Do You Need Microservices?Scaling ChallengesTechnical ReasonsSpotting Potential PitfallsOrganizational Structure and CultureSoftware Delivery Approach
B. Recommended Reading
Part I: ContextPart II: Organizational Structure and CulturePart III: Building and Operating
Index
About the Author

Content preview from Enabling Microservice Success

Chapter 13. Running Your System in Production

As I discussed in Chapter 8, engineering teams have to get much more involved in running the services they build when they adopt a microservice architecture. If you are releasing multiple changes a day, handing releases and production support over to another team would slow you down too much.

That means building systems differently, but it also means operating them in production: the focus of this chapter.

Microservices bring particular operational challenges: they are distributed systems, with many parts, and what the system as a whole looks like will generally be very different from what it looked like six months ago. As a result, documentation struggles to keep up.

I am going to start this chapter with an overview of these operational challenges, to set the context for the following sections that will provide ways to tackle them.

That starts with observability. Operating microservices is made a lot easier if you have built observability into your services and the system, and we’ll go over how to do that effectively. You should also consider building your own utilities and tools to give further insights into your system or to help with fixing problems.

Observability is about being able to infer what is going on in your system from the external outputs: logs, traces, metrics, etc. To successfully run your system in production, you also need to be able to work out when you have an issue. Easier said than done, because all the resilience ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781098130787Errata Page

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Enabling Microservice Success

by Sarah Wells

Chapter 13. Running Your System in Production

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.