book

Cloud Architecture Patterns

Name: Cloud Architecture Patterns
Author: Bill Wilder
ISBN: 9781449319779

by Bill Wilder

September 2012

Intermediate to advanced

182 pages

5h 12m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Cloud Architecture Patterns
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Preface
AudienceWhy This Book ExistsAssumptions This Book MakesContents of This BookBuilding Page of Photos on Windows AzureTerminologyConventions Used in This BookUsing Code ExamplesSafari® Books OnlineHow to Contact UsAcknowledgments
1. Scalability Primer
Scalability DefinedVertically Scaling UpHorizontally Scaling OutDescribing ScalabilityThe Scale UnitResource Contention Limits ScalabilityEasing Resource ContentionScalability is a Business ConcernThe Cloud-Native ApplicationCloud Platform DefinedCloud-Native Application DefinedSummary
2. Horizontally Scaling Compute Pattern
ContextCloud SignificanceImpactMechanicsCloud Scaling is ReversibleCloud scaling terminologyManaging Session StateSession state varies by application tierSticky sessions in the web tierStateful node challengesSession state without stateful nodesStateless service nodes in the service tierManaging Many NodesEfficient management enables horizontal scalingCapacity planning for large scaleSizing virtual machinesFailure is partialOperational data collectionExample: Building PoP on Windows AzureWeb TierStateless Role Instances (or Nodes)Service TierOperational Logs and MetricsSummary
3. Queue-Centric Workflow Pattern
ContextCloud SignificanceImpactMechanicsQueues are ReliableProgramming Model for ReceiverInvisibility window and at-least-once processingIdempotent processing for repeat messagesPoison messages handling for excessive repeatsUser Experience ImplicationsScaling Tiers IndependentlyExample: Building PoP on Windows AzureUser Interface TierService TierSynopsis of Changes to Page of Photos SystemSummary
4. Auto-Scaling Pattern
ContextCloud SignificanceImpactMechanicsAutomation Based on Rules and SignalsSeparate ConcernsBe Responsive to Horizontally Scaling OutDon’t Be Too Responsive to Horizontally Scaling InSet Limits, Overriding as NeededTake Note of Platform-Enforced Scaling LimitsExample: Building PoP on Windows AzureThrottlingAuto-Scaling Other Resource TypesSummary
5. Eventual Consistency Primer
CAP Theorem and Eventual ConsistencyEventual Consistency ExamplesRelational ACID and NoSQL BASEImpact of Eventual Consistency on Application LogicUser Experience ConcernsProgrammatic DifferencesSummary
6. MapReduce Pattern
ContextCloud SignificanceImpactMechanicsMapReduce Use CasesBeyond Custom Map and Reduce FunctionsMore Than Map and ReduceExample: Building PoP on Windows AzureSummary
7. Database Sharding Pattern
ContextCloud SignificanceImpactMechanicsShard IdentificationShard DistributionWhen Not to ShardNot All Tables Are ShardedCloud Database InstancesExample: Building PoP on Windows AzureRebalancing FederationsFan-Out Queries Across FederationsNoSQL AlternativeSummary

8. Multitenancy and Commodity Hardware Primer
MultitenancySecurityPerformance ManagementImpact of Multitenancy on Application LogicCommodity HardwareShift in Emphasis from MTBF to MTTRImpact of Commodity Hardware on Application LogicHomogeneous HardwareSummary
9. Busy Signal Pattern
ContextCloud SignificanceImpactMechanicsTransient Failures Result in Busy SignalsRecognizing Busy SignalsResponding to Busy SignalsUser Experience ImpactLogging and Reducing Busy SignalsTestingExample: Building PoP on Windows AzureSummary
10. Node Failure Pattern
ContextCloud SignificanceImpactMechanicsFailure ScenariosTreat All Interruptions as Node FailuresMaintain Sufficient Capacity for Failure with N+1 RuleHandling Node ShutdownNode shutdown with minimal impact to user experienceNode shutdown without losing partially completed workNode shutdown without losing operational dataRecovering From Node FailureShielding interactive users from failuresResuming work-in-progress on backend systemsExample: Building PoP on Windows AzurePreparing PoP for FailureN+1 ruleWindows Azure fault domainsUpgrade domainsHandling PoP Role Instance ShutdownWeb role instance shutdownWorker role instance shutdownUse controlled rebootsRecovering PoP From FailureSummary
11. Network Latency Primer
Network Latency ChallengesReducing Perceived Network LatencyReducing Network LatencySummary
12. Colocate Pattern
ContextCloud SignificanceImpactMechanicsAutomation HelpsCost ConsiderationsNon-Technical ConsiderationsExample: Building PoP on Windows AzureAffinity GroupsOperational Logs and MetricsSummary
13. Valet Key Pattern
ContextCloud SignificanceImpactMechanicsPublic AccessGranting Temporary AccessSecurity ConsiderationsExample: Building PoP on Windows AzurePublic Read AccessShared Access SignaturesSummary
14. CDN Pattern
ContextCloud SignificanceImpactMechanicsCaches Can Be InconsistentExample: Building PoP on Windows AzureCost ConsiderationsSecurity ConsiderationsAdditional CapabilitiesSummary
15. Multisite Deployment Pattern
ContextCloud SignificanceImpactMechanicsNon-Technical Considerations in Data Center SelectionCost ImplicationsFailover Across Data CentersExample: Building PoP on Windows AzureChoosing a Data CenterRouting to the Closest Data CenterReplicating User Data for PerformanceReplicating Identity Information for Account OwnersData Center FailoverColocation AlternativesSummary
A. Further Reading
Page of Photos (PoP) SampleResources From Preface and ChaptersChapter 1Chapter 2Chapter 3Chapter 4Chapter 5Chapter 6Chapter 7Chapter 8Chapter 9Chapter 10Chapter 11Chapter 12Chapter 13Chapter 14Chapter 15
Index
About the Author
SPECIAL OFFER: Upgrade this ebook with O’Reilly
Copyright

Content preview from Cloud Architecture Patterns

Chapter 6. MapReduce Pattern

This pattern focuses on applying the MapReduce data processing pattern.

Note

MapReduce in this chapter is explicitly tied to the use of Hadoop since that helps pin down its capabilities and avoid confusion with other variants. The term MapReduce is used except when directly referencing the Hadoop project (which is introduced below).

MapReduce is a data processing approach that presents a simple programming model for processing highly parallelizable data sets. It is implemented as a cluster, with many nodes working in parallel on different parts of the data. There is large overhead in starting a MapReduce job, but once begun, the job can be completed rapidly (relative to conventional approaches).

MapReduce requires writing two functions: a mapper and a reducer. These functions accept data as input and then return transformed data as output. The functions are called repeatedly, with subsets of the data, with the output of the mapper being aggregated and then sent to the reducer. These two phases sift through large volumes of data a little bit at a time.

MapReduce is designed for batch processing of data sets. The limiting factor is the size of the cluster. The same map and reduce functions can be written to work on very small data sets, and will not need to change as the data set grows from kilobytes to megabytes to gigabytes to petabytes.

Some examples of data that MapReduce can easily be programmed to process include text documents (such as all the documents ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

O’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.

Julian F.

Head of Cybersecurity

I wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.

Addison B.

Field Engineer

I’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.

Amir M.

Data Platform Tech Lead

I'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.

Mark W.

Embedded Software Engineer

Publisher Resources

ISBN: 9781449357979Errata

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills

Cloud Architecture Patterns

by Bill Wilder

Chapter 6. MapReduce Pattern

Note

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.