Chapter 15. Case Study: Scaling Server Instances
Consider a data center. How many server instances do you need to spin up? Just enough to handle incoming requests, right? But precisely how many instances will be “enough”? And what if the traffic intensity changes? Especially in a “cloud”-like deployment situation—where resources can come and go and we only pay for the resources actually committed—it makes sense to exercise control constantly and automatically.
The situation sketched in the paragraph above is common enough, but it can describe quite a variety of circumstances, depending on the specifics. Details matter! For now, we will assume the following.
Control action is applied periodically—say, once every second, on the second.
In the interval between successive control actions, requests come in and are handled by the servers. If there aren’t enough servers, then some of the requests will not be answered (“failed requests”).
Requests are not queued. Any request that is not immediately handled by a server is “dropped.” There is no accumulation of pending requests.
The number of incoming and answered requests for each interval is available and can be obtained.
The number of requests that arrive during each interval is (of course) a random quantity, as is the number of requests that a server does handle. In addition to the second-to-second variation in the number of incoming requests, we also expect slow “drifts” of traffic intensity. These drifts typically take place ...