Chapter 4. Limitations of Serverless
So far we’ve talked about what Serverless is and how we got here, shown you what Serverless applications look like, and told you the many wonderful ways that Serverless will make your life better. So far it’s been all smiles, but now we need to tell you some hard truths.
Serverless is a different way of building and operating systems, and just like with most alternatives, there are limitations as well as advantages. Add to that the fact that Serverless is still new—AWS Lambda is the most mature FaaS platform, and its first, very limited version was only launched in late 2014.
All of this innovation and novelty means some big caveats—not everything works brilliantly well, and even those parts that do we haven’t yet figured out the best ways of using. Furthermore, there are some implicit tradeoffs of using such an approach, which we discuss first.
Some of the limitations of Serverless just come with the territory—we’re never going to completely get around them. These are inherent limitations. Over time we’ll learn better how to work around these, or in some cases even to embrace them.
It may seem obvious, but in a Serverless application, the management of state can be somewhat tricky. Aside from the components that are explicitly designed to be data stores, most Serverless components are effectively stateless. While this chapter is specifically about limitations, it’s worth mentioning that one benefit of that statelessness is that scaling those components simply becomes a matter of increasing concurrency, rather than giving each instance of a component (like an AWS Lambda function) more resources.
However, the limitations are certainly clear as well. Stateless components must, by definition, interact with other, stateful components to persist any information beyond their immediate lifespan. As we’ll talk about in the very next section, that interaction with other components inevitably introduces latency, as well as some complexity.
What’s more, stateful Serverless components may have very different ways of managing information between vendors. For example, a BaaS product like Firebase, from Google, has different data expiry mechanisms and policies than a similar product like DynamoDB, from AWS.
Also, while statelessness is the fundamental rule in many cases, oftentimes specific implementations, especially FaaS platforms, do preserve some state between function invocations. This is purely an optimization and cannot be relied upon as it depends heavily on the underlying implementation of the platform. Unfortunately, it can also confuse developers and muddy the operational picture of a system. One knock-on effect of this opportunistic state optimization is that of inconsistent performance, which we’ll touch on later.
In a non-Serverless application, if latency between application components is a concern, those components can generally be reliably co-located (within the same rack, or on the same host instance), or can even be brought together in the same process. Also, communication channels between components can be optimized to reduce latency, using specialized network protocols and data formats.
Successful early adopters of Serverless, however, advocate having small, single-purpose FaaS functions, triggered by events from other components or services. Much of the inter-component communication in these systems happens via HTTP APIs, which can be slower than other transports. Interaction with BaaS components also follows a similar flow. The more components communicating over un-optimized channels, the more latency will be inherent in a Serverless application.
Another impact on latency is that of “cold starts,” which we’ll address a little later in this chapter.
While Serverless platform providers are always improving the performance of their underlying infrastructure, the highly-distributed, loosely coupled nature of Serverless applications means that latency will always be a concern. For some classes of problems, a Serverless approach may not be viable based on this limitation alone.
The difficulty of local testing is one of the most jarring limitations of Serverless application architectures. In a non-Serverless world, developers often have local analogs of application components (like databases, or message queues) which can be integrated for testing in much the same way the application might be deployed in production. Serverless applications can, of course, rely on unit tests, but more realistic integration or end-to-end testing is significantly more difficult.
The difficulties in local testing of Serverless applications can be classified in two ways. Firstly, because much of the infrastructure is abstracted away inside the platform, it can be difficult to connect the application components in a realistic way, incorporating production-like error handling, logging, performance, and scaling characteristics. Secondly, Serverless applications are inherently distributed, and consist of many separate pieces, so simply managing the myriad functions and BaaS components is challenging, even locally.
Instead of trying to perform integration testing locally, we recommend doing so remotely. This makes use of the Serverless platform directly, although that too has limitations, as we’ll describe in the next section.
Loss of Control
Many of the limitations of Serverless are related to the reality that the FaaS or BaaS platform itself is developed and operated by a third party.
In a non-Serverless application, the entirety of the software stack may be under our control. If we’re using open source software, we can even download and alter components from the operating system boot loader to the application server. However, such breadth of control is a double-edged sword. By altering or customizing our software stack, we take on implicit responsibility for that stack and all of the attendant bug fixes, security patches, and integration. For some use cases or business models, this makes sense, but for most, ownership and control of the software stack distracts focus from the business logic.
Going Serverless inherently involves giving up full control of the software stack on which code runs. We’ll describe how that manifests itself in the remainder of this section.
Loss of control: configuration
An obvious limitation of Serverless is a loss of absolute control over configuration. For example, in the AWS Lambda FaaS platform, there are a very limited number of configuration parameters available, and no control whatsoever over JVM or operating system runtime parameters.
BaaS platforms are no different in this respect. The platform provider may expose some configuration, but it’s likely to be limited or abstracted from however the actual underlying software is configured.
Loss of control: performance
Coupled closely with loss of control over configuration is a similar loss of control over the performance of Serverless components. The performance issue can be further broken down into two major categories: performance of application code and performance of the underlying Serverless platform.
Serverless platforms hide the details of program execution, in part due to the multiple layers of virtualization and abstraction that allow the platform operators to efficiently utilize their physical hardware. If you have access to the physical hardware, core operating system, and runtime, it is straightforward to optimize your application code for peak performance on that hardware and software foundation. If your code is running in a container, which is itself running on a virtual server (like an EC2 instance), it becomes much more difficult to predict or optimize how your code might perform.
In observations of benchmarking code running on the AWS Lambda platform, we see that identically configured Lambdas can have drastically different performance characteristics. Those characteristics can vary for the same Lambda over the course of minutes or hours, as the underlying platform alters scheduling priorities and resource allocations in response to demand.
Similarly, the performance of BaaS platforms can be inconsistent from one request to the next. In combination with the loss of control over configuration, that inconsistency can be frustrating to encounter, especially when there are few options for resolution outside of raising a support ticket with the platform provider.
Loss of control: issue resolution
Once that support ticket is opened, however, who has the capability to resolve it? Issue resolution is another area in which we cede control to a vendor.
In a fully controlled system, if a hardware component has a fault, or the operating system requires a security patch, the owner of the system can take action to resolve issues. This extends into any infrastructure that the owner of the system also controls. In the case of noncritical issues, the system owner might choose to delay downtime or a maintenance window to a convenient time, perhaps when there is less load on the system or when a backup system might be available.
In a Serverless world, the only issues we can resolve are those within our application code, or issues due to the configuration of Serverless components and services. All other classes of issues must be resolved by the platform owner—we may not even know when or if an issue has occurred. AWS is well known for a lack of visibility into most issues with their underlying platforms, even serious ones. The AWS status page displays a sea of green checkmarks—only a sharp eye will pick out the occasional italicized “i” next to a green checkmark. That innocuous looking “i” represents nearly every state from “the service had a few sporadic errors” to “an earthquake destroyed a data center.” While it seems like understatement, it is also a testament to the global scale and resilience of the AWS infrastructure in that the loss of a data center is not necessarily a catastrophic event.
Loss of control: security
The last major aspect of loss of control that we’re going to cover is security. As with issue resolution, the only opportunity to affect the security of a Serverless application is through the mechanisms supplied by the platform provider. These mechanisms often take the form of platform-specific security features instead of operating system level controls. Unfortunately, but unsurprisingly, those platform-specific security features are not generally compatible or transferable between platforms.
Furthermore, platform security controls may not meet the security requirements of your application. For example, all AWS API Gateways can be reached from anywhere on the public internet; access is controlled solely via API keys rather than any transport-based access controls. However, many internal applications are locked down via network controls. If an application should only be accessible from certain IP addresses, then API Gateway cannot be used.
In contrast to all of the previous inherent limitations, implementation limitations are those that are a fact of Serverless life for now, but which should see rapid improvement as the Serverless ecosystem improves and as the wider Serverless community gains experience in using these new technologies.
As we alluded to earlier, Serverless platforms can have inconsistent and poorly documented performance characteristics.
One of the most common performance issues is referred to as a cold start. On the AWS Lambda platform, this refers to the instantiation of the container in which our code is run, as well as some initialization of our code. These slower cold starts occur when a Lambda function is invoked for the first time or after having its configuration altered, when a Lambda function scales out (to more instances running concurrently), or when the function simply hasn’t been invoked in a while.
Once a container is instantiated, it can handle events without undergoing that same instantiation and initialization process. These “warm” invocations of the Lambda function are much faster. On the AWS Lambda platform, regularly used containers stay warm for hours, so in many applications cold starts are infrequent. For an AWS Lambda function processing at least one event per second, more than 99.99% of events should be processed by a warm container.
The difference between the “cold” and “warm” performance of FaaS functions makes it difficult to consistently predict performance, but as platforms mature, we feel that these limitations will be minimized or addressed.
Given the newness of Serverless technologies, it’s no surprise that tooling around deployment, management, and development is still in a state of infancy. While there are some tools and patterns out there right now, it’s hard to say which tools and patterns will ultimately emerge as future “best practices.”
Tooling limitations: deployment tools
Serverless deployment tools interact with the underlying platform, usually via an API. Since Serverless applications are composed of many individual components, deploying an entire application atomically is generally not feasible. Because of that fundamental architectural difference, it can be challenging to orchestrate deployments of large-scale Serverless applications.
Tooling limitations: execution environments
One of the most well publicized limitations of Serverless is the constrained execution environment of FaaS platforms. FaaS functions execute with limited CPU, memory, disk, and I/O resources, and unlike legacy server processes, cannot run indefinitely. For example, AWS Lambda functions can execute for a maximum of five minutes before being terminated by the platform, and are limited to a maximum of 1.5 GB of memory.
As the FaaS platform’s underlying hardware gets more powerful, we can expect these resource limits to increase (as they already have in some cases). Further, designing a system to work comfortably within these limits often leads to a more scalable architecture.
Tooling limitations: monitoring & logging
One of the benefits of Serverless is that you’re no longer responsible for many host- or process-level aspects of an application, and so monitoring metrics like disk space, CPU, and network I/O is not necessary (or, in fact, supported in many situations). However metrics more closely associated with actual business functionality still need to be monitored.
The extent to which monitoring is well supported in a Serverless environment is currently a mixed bag. As an example, AWS Lambda has a number of ways monitoring can be performed, but some of them are poorly documented, or at least poorly understood by most users. AWS also gives a default logging platform in CloudWatch Logs. CloudWatch Logs is somewhat limited as a log analysis platform (for example, searching over a number of different sources); however, it is fairly easy to export logs from CloudWatch to another system.
An area that is significantly lacking in support at present is distributed monitoring—that is the ability to understand what is happening for a business request as it is processed by a number of components. This kind of monitoring is under active development generally since it’s also a concern for users of Microservices architectures, however Serverless systems will be much more easily operated once this kind of functionality is common place.
Tooling limitations: remote testing
In stark contrast to the inherent limitations of testing Serverless applications locally, the difficulty of remote testing is merely an implementation limitation. Some Serverless platform providers do make some remote testing possible, but typically only at the component level (for example, an individual function), not at the Serverless application level.
It can be difficult to exhaustively test a complex Serverless application without setting up an entirely separate account with the platform provider, to ensure that testing does not impact production resources, and to ensure that account-wide platform limits are not exceeded by testing.
Tooling limitations: debugging
Debugging Serverless applications is still quite difficult, although due to their often stateless nature, there is less to be gained in introspection and runtime debugging. However, for thorny problems, there is no replacement for a runtime debugger that allows introspection and line-by-line stepping.
At the time of this writing, there is no production-ready capability to remotely debug AWS Lambda functions. Microsoft Azure Functions written in C# can be remotely debugged from within the Visual Studio development environment, but this capability doesn’t exist for the other Azure Function language runtimes.
In addition to the limitations of debugging Serverless compute components, debugging Serverless applications as a whole is difficult, as it is with any distributed application. Services like AWS X-Ray are starting to enable distributed tracing of messages across Serverless infrastructure and components, but those tools are in their infancy. Third-party solutions do exist, but come with their own set of concerns and caveats, including integration challenges, performance impact, and cost. Of course, given the initial steps in this area, we can anticipate more progress in the near future.
Vendor lock-in seems like an obviously inherent limitation of Serverless applications. However, different Serverless platform vendors enforce different levels of lock-in, through their choice of integration patterns, APIs, and documentation. Application developers can also limit their use of vendor-specific features, admittedly with varying degrees of success depending on the platform.
For example, AWS services, while mostly closed-source and fully managed, are well documented, and at a high level can be thought of in abstract terms. DynamoDB can be thought of as simply a high-performance key-value store. SQS is simply a message queue, and Kinesis is an ordered log. Now, there are many specifics around the implementation of those services which make them AWS-specific, but as high-level components within a larger architecture, they could be switched out for other, similar components from other vendors.
That being said, we of course must also acknowledge that much of the value of using a single Serverless vendor is that the components are well integrated, so to some extent the vendor lock-in is not necessarily in the components themselves, but in how they can be tied together easily, performantly, and securely.
On the other side of the vendor spectrum from AWS are platforms like Apache OpenWhisk, which is completely open source and not ostensibly tied to any single vendor (although much of its development is done by IBM to enable their fully-managed platform).
BaaS components, though, are somewhat more of a mixed bag. For example, AWS’s S3 service has a published API specification, and other vendors like Dreamhost provide object storage systems that are API-compatible with S3.
Immaturity of Services
Some types of Serverless services, especially FaaS, work better with a good ecosystem around them. We see that clearly with the various services that AWS has built, or extended, to work well with Lambda.
Some of these services are new and still need to have a few more revisions before they cover a lot of what we might want to throw at them. API Gateway, for example, has improved substantially in its first 18 months but still doesn’t support certain features we might expect from a universal web server (e.g., web sockets), and some features it does have are difficult to work with.
Similarly, we see brand-new services (at time of writing) like AWS Step Functions. This is a product that’s clearly trying to solve an architectural gap in the Serverless world, but is very early in its capabilities.
We’ve covered the inherent and implementation limitations of Serverless in a fairly exhaustive way. The inherent limitations, as we discussed, are simply the reality of developing and operating Serverless applications in general, and some of these limitations are related to the loss of control inherent in using a Serverless or cloud platform. While there may be some standardization in how we interact with platforms from different vendors, we’re still ceding a substantial amount of control to the provider. Also, in some cases we’ll find workarounds or opportunities for standardization (for example, AWS’ Serverless Application Model, aka SAM).
The implementation limitations are also significant, but for the most part we can look forward to these limitations being addressed by platform providers and the wider community. As we gain collective experience in building and running Serverless applications, we will see most of these implementation limitations fall to the wayside in favor of well-designed and well-considered solutions.