Chapter 1. Scalability Primer
This primer explains scalability with an emphasis on the key differences between vertical and horizontal scaling.
Scaling is about allocating resources for an application and managing those resources efficiently to minimize contention. The user experience (UX) is negatively impacted when an application requires more resources than are available. The two primary approaches to scaling are vertical scaling and horizontal scaling. Vertical scaling is the simpler approach, though it is more limiting. Horizontal scaling is more complex, but can offer scales that far exceed those that are possible with vertical scaling. Horizontal scaling is the more cloud-native approach.
This chapter assumes we are scaling a distributed multi-tier web application, though the principles are also more generally applicable.
Note
This chapter is not specific to the cloud except where explicitly stated.
Scalability Defined
The scalability of an application is a measure of the number of users it can effectively support at the same time. The point at which an application cannot handle additional users effectively is the limit of its scalability. Scalability reaches its limit when a critical hardware resource runs out, though scalability can sometimes be extended by providing additional hardware resources. The hardware resources needed by an application usually include CPU, memory, disk (capacity and throughput), and network bandwidth.
An application runs on multiple nodes, which have ...