This pattern helps us maintain the resilience and responsiveness of the system. This pattern states that we should control the number of requests a service can handle. Most modern servers provide a request queue, which can be configured to let it know how many requests should be queued before requests are dropped and a server-busy message is sent back to the calling entity. We are extending this approach to the services level. Every service should be based on a queue, which will hold the requests to be served.
The queue should have a fixed size, which is the amount the service can handle in a specific amount of time, say, one minute. For example, if we know that service X can handle 500 requests in one minute, we should ...