book

System Design on AWS

by Jayanth Kumar, Mandeep Singh

February 2025

Intermediate to advanced

612 pages

19h 18m

English

O'Reilly Media, Inc.

Read now

Unlock full access

Includes

Includes Quizzes

How This Book Is OrganizedWhat This Book Isn’tWho Should Read This BookConventions Used in This BookO’Reilly Online LearningHow to Contact UsAcknowledgmentsMandeep SinghJayanth Kumar
System Design ConceptsCommunicationConsistencyAvailabilityReliabilityScalabilityMaintainabilityFault ToleranceFallacies of Distributed ComputingSystem Design Trade-offsTime Versus SpaceLatency Versus ThroughputPerformance Versus ScalabilityConsistency Versus AvailabilitySystem Design GuidelinesGuideline of Isolation: Build It ModularlyGuideline of Simplicity: Keep It Simple, SillyGuideline of Performance: Metrics Don’t LieGuideline of Trade-offs: There Is No Such Thing as a Free LunchGuideline of Use Cases: It Always DependsConclusion
Data Storage FormatFile-Based StorageBlock-Based StorageObject-Based StorageRelational DatabasesRelational Database ConceptsRelational Database Management System ArchitectureOptimizing Relational DatabasesScaling Relational DatabasesOpen Source Relational Database SystemsConclusion
Nonrelational Database ConceptsSchema FlexibilityData ModelsScalabilityHigh Availability and Fault ToleranceBASEKey-Value DatabasesData ModelData Access and Retrieval OperationsScaling Key-Value StoresAvailability in Key-Value StoresAdvantages, Trade-offs, and ConsiderationsDynamo: Key-Value DatabaseDocument DatabasesData ModelAvailability in Document StoresAdvantages, Trade-offs, and ConsiderationsMongoDB: Open Source Document DatabaseColumnar DatabasesData ModelConsistency LevelsColumnar Store ArchitectureAdvantages, Trade-offs, and ConsiderationsApache Cassandra: Open Source Columnar DatabaseGraph DatabasesData ModelData Access and RetrievalAdvantages, Trade-offs, and ConsiderationsNeo4j: Open Source Graph DatabaseConclusion
Benefits of CachingCache Eviction PoliciesBelady’s AlgorithmQueue-Based PoliciesRecency-Based PoliciesFrequency-Based PoliciesAllowlist PolicyCache InvalidationCaching StrategiesRead-Intensive StrategiesWrite-Intensive StrategiesCache DeploymentIn-Process CachingInterprocess CachingRemote CachingChoosing a Cache Deployment ApproachCaching MechanismsContent Delivery NetworksPush CDNsPull CDNsOpen Source Caching SolutionsMemcachedRedisConclusion
Networking ComponentsBenefits of Load BalancingLB Deployment and Placement StrategiesGlobal Server Load BalancingLocal Load BalancingLoad Balancing AlgorithmsStatic Load Balancing AlgorithmsDynamic Load Balancing AlgorithmsSession Persistence in LBsStateful Load BalancersStateless Load BalancersTypes of Load BalancersLB Types Based on FunctionalityLB Types Based on ConfigurationNginx: Open Source Load BalancerConclusion
Communication Models and ProtocolsOSI ModelTCP/IP ModelCommunication TypesPull Mechanism: HTTP PollingPush Mechanism: WebSocketsPush Mechanism: Server-Sent EventsCommon Communication Protocol StandardsRemote Procedure CallRESTGraphQLWeb Real-Time CommunicationConclusion
Evolution of Application DeploymentContainerizationDockerContainer OrchestrationContainer Deployment StrategiesCI/CD Pipeline with Gitflow and Automated Deployment StrategiesGitflow Workflow for Branch ManagementContinuous IntegrationContinuous DeploymentMonitoring and Incident ManagementConclusion

Change Data CapturePublisher-Subscriber ArchitectureMessage BrokersMessage QueuesChoreography and OrchestrationChoreographyOrchestrationDeciding Between Choreography and OrchestrationBig Data ArchitectureLambda ArchitectureKappa ArchitectureData Lake ArchitectureSolution ArchitectureMonolithsN-tier ArchitecturesMicroservicesEvent-Driven ArchitectureEDA Concepts and ImplementationsParadigms of Event-Driven ImplementationsCommon Cloud Architecture PatternsEvent-Based Patterns: CQRS and SagaFailure-Tolerant Patterns: Circuit Breaker, Retry with Backoff, and Rate LimiterDomain-Based Patterns: Domain-Driven Design and Decompose by SubdomainsAPI Routing Strategies and PatternsOther Cloud Architecture PatternsOpen Source Distributed Systems ArchitectureHDFSApache Kafka: Distributed Message QueueComparing HDFS and KafkaConclusion
Getting Started with AWSAWS RegionsAWS Availability ZonesAWS Local ZonesAWS Edge LocationsIntroduction to AWS Networking ServicesAmazon VPCSubnetsInternet ConnectivityRoute TablesSecurity GroupsNetwork Access Control ListsAmazon VPC-to-Internet ConnectivityConnectivity Between Amazon VPCsHybrid ConnectivityAmazon Route 53AWS Elastic Load BalancerAmazon API GatewayAmazon CloudFrontConclusion
Cloud Storage on AWSAmazon Elastic Block StoreAmazon Elastic File SystemAmazon Simple Storage ServiceAWS DatabasesAmazon RDSAmazon DynamoDBAmazon DocumentDBAmazon NeptuneAmazon ElastiCacheAmazon OpenSearchAmazon TimestreamAmazon KeyspacesConclusion
Amazon Elastic Compute CloudAmazon Machine ImageInstance TypeAutoscalingAWS LambdaContainerization ServicesAmazon Elastic Container ServiceAmazon Elastic Kubernetes ServiceConclusion
Amazon Managed Streaming for Apache KafkaAmazon KinesisAmazon Kinesis Data StreamsAmazon Kinesis Data AnalyticsAmazon Kinesis Data FirehoseAmazon Kinesis Video StreamsAmazon Simple Queue ServiceAmazon Simple Notification ServiceWorkflow OrchestrationAWS Step FunctionsAmazon Managed Workflow for Apache AirflowAmazon CloudWatchApplication LogsMetrics and AlarmsAWS Identity and Access ManagementAmazon CognitoAWS AppSyncConclusion
AWS Big Data and AnalyticsAmazon Elastic MapReduceAWS GlueAmazon AthenaAmazon QuickSightAmazon RedshiftMachine Learning on AWSAmazon SageMakerAWS ML Application ServicesAWS ML InfrastructureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStorage SpaceStarting with the DesignURL Shortening AlgorithmSystem APIsSystem ConsiderationsDatabase SelectionCustom Domain SupportLaunching the System on AWSDay Zero ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignDesigning the Web CrawlerDesigning the Search EngineLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignHandling New PostsManaging User ConnectionsSearch ServiceLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignConcepts and PrinciplesA Rough System DesignLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignProperty Onboarding ArchitectureProperty Search ArchitectureProperty Booking ArchitectureProperty Reviews ArchitectureLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignMessaging ArchitectureWhatsApp Architecture with ErlangLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignVideo EncodingVideo-Quality ValidationContent IndexingContent DistributionLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion
System RequirementsFunctional and Nonfunctional RequirementsSystem ScaleStarting with the DesignDesigning a Stock Tick SystemDesigning the Order Management SystemDesigning Ultra-Low-Latency SystemsBuilding the P&L DashboardLaunching the System on AWSDay 0 ArchitectureScaling to Millions and BeyondDay N ArchitectureConclusion

Content preview from System Design on AWS

Chapter 13. Big Data, Analytics, and Machine Learning Services

In the world of information technology, data is generated at a huge volume. This data can be just the information of all registered users on online food-ordering applications or real-time user actions captured on the application. Data generated at large volume is referred to as big data. If you have a use case to store this data, you can utilize the storage solutions we discussed in Chapter 10 based on your requirements. This chapter focuses on how to process the data at high volume. How can we generate insights out of data already present in storage or live streaming data by running data analytics or ML models on top of it? For example, we might want to determine the most ordered food item based on location or the restaurant with the highest rating in a particular locality.

The first part of this chapter introduces you to AWS big data, live streaming, and analytics services such as Amazon Elastic MapReduce (EMR), AWS Glue, Amazon Athena, Amazon QuickSight, and Amazon Redshift. The second section explores how you can run ML workloads on the AWS cloud and the different services supporting that.

AWS Big Data and Analytics

Information is vital to making business decisions or serving our customers better, but the volume of data is rapidly growing, ranging from terabytes to petabytes (and more). The variety of data is also increasing—data can be in any form. We require specific tools to store and process big data. Traditional ...