Instance Pooling/Caching
Because of the strict concurrency rules enforced by the Container, an intentional bottleneck is often introduced where a service instance may not be available for processing until some other request has completed. If the service was restricted to a singular instance, all subsequent requests would have to queue up until their turn was reached (see Figure 3-1).

Figure 3-1. Client requests queuing for service
Conversely, if the service was permitted to use any number of underlying instances, there would be no guard to say how many requests could be processed in tandem, and access across the physical machine could crawl to a halt as its resources were spread too thin (Figure 3-2).

Figure 3-2. Many invocations executing concurrently with no queuing policy
EJB addresses this problem through a technique called instance pooling, in which each module is allocated some number of instances with which to serve incoming requests (Figure 3-3). Many vendors provide configuration options that allow the deployer to allocate pool sizes appropriate to the work being performed, providing the compromise needed to achieve optimal throughput.

Figure 3-3. A hybrid approach using a pool
Instance ...