Chapter 1. What Are Application Delivery and Load Balancing, and Why Are They Important?
When anyone uses an application, they expect it to respond quickly, efficiently, and reliably. When a user encounters errors, outages, or overcapacity messages, they generally don’t wonder why. They get annoyed, quit using the application, and ultimately complain about the application’s owner on social media. It doesn’t matter that the company was having a great day, and that requests to their servers went through the roof. What matters is that the user’s request resulted in a failure of the application and, in the user’s eyes, the company providing the application.
As more companies move their applications and services to the cloud, responding to variable demands and workloads has become increasingly important. In this chapter, we’ll introduce the concepts behind application delivery and load balancing by explaining the purpose of application delivery controllers and load balancers and the problems they solve. Subsequent chapters will explore how you can use Microsoft Azure and NGINX for high-performance application delivery and load balancing.
Application Delivery Controllers
Put simply, application delivery is a process that ensures that an application is delivered properly and efficiently to its clients, no matter the load or the behind-the-scenes process and availability. Servers can fail for any number of reasons, from demand capacity to security breaches or a simple mechanical failure. When that server is solely responsible for delivery of an application that customers or employees are relying on, then the application also fails. Organizations need ways to be adaptable and provide optimal performance and availability in any given situation.
At the heart of modern web application delivery is a data plane, which makes possible the application’s delivery to the client. Modern data planes are typically made up of reverse proxies working together to provide an optimal experience to the user. An advanced proxy with routing, authentication, and security controls is often referred to as an application delivery controller (ADC). ADCs help ensure maximum performance and capacity by sitting between the user and the application servers, directing valid requests only to servers that are currently online. Thus, an ADC offers a layer of control between the user experience and the application.
Hardware or Software
All ADCs are essentially software solutions that receive a network transmission and elevate it to the application layer for full control. On-premises hardware/software bundled black-box options are available for purchase from vendors. However, the hardware is not the special sauce; it’s the software the machine runs that provides you the control over your application’s delivery. In today’s world, disruptors are those who are able to move quickly and adapt to changes in technology, which leaves no room for lengthy hardware procurement and installation processes.
If an ADC vendor does not have a virtual appliance or software solution available, then you need to reconsider your vendor. Hardware optimization and acceleration makes a minimal impact on business value compared to agility.
Structure and Function of ADCs
In general, an ADC accepts a request from a client and decides how best to serve that specific request based on rules in its configuration. An ADC may receive a request, validate its intended route and method, pass it through processing for security vulnerabilities, validate its authentication and authorization, and manipulate headers or the like before making a request on behalf of the client to an application server (one that the ADC knows is responsive) to fulfill the request.
This process may happen at multiple layers within a web application’s stack. An ADC that is positioned in close proximity to a client may proxy the request to an ADC that is in closer proximity to the application servers. Both ADCs enable control of how your application is delivered to the client, at different levels, to provide you maximum configurability.
An ADC may be in place for a specific application or accept requests for a number of purposes to solve the needs of your use case. At their core, ADCs are proxies configured as necessary to deliver your application.
Load Balancers
Optimal load distribution reduces site inaccessibility caused by failure or stress of a single server while assuring consistent performance for all users. Different routing techniques and algorithms ensure optimal performance in varying load-balancing scenarios.
Modern websites must support concurrent connections from clients requesting text, images, video, or application data, all in a fast and reliable manner, while scaling from hundreds of users to millions of users during peak times. Load balancers are a critical part of this scalability.
Load balancers, introduced in the 1990s as hardware-based servers or appliances, have evolved considerably. Managed cloud load balancing is an updated alternative to hardware load balancers; they provide a configuration interface, but you don’t have to manage hardware, software, updates, or anything but the configuration. Regardless of the implementation of a load balancer, scalability is still the primary goal of load balancing, even though modern load balancers can do so much more. Figure 1-1 shows a basic load-balancing solution. The client in any of these scenarios might be an end user’s browser, a mobile application, or another web service.
An ADC will perform load balancing when multiple possible responders are configured for a given request. Many ADCs will enable configuration of the algorithm that it uses to select which responder it proxies a given request to, which in turn is how it balances load.
The OSI Model and Load Balancing
Before we discuss load balancing, especially in the cloud, in any more detail, it’s important to review the Open System Interconnection (OSI) model. This conceptual model provides a visual representation of the interoperation between systems that is universally applicable no matter what hardware or network characteristics are involved. The OSI model performs no functions in the networking process. It is a conceptual framework to help understand complex interactions. The model defines a networking framework that implements seven layers:
- Layer 7: Application layer
- Layer 6: Presentation layer
- Layer 5: Session layer
- Layer 4: Transport layer
- Layer 3: Network layer
- Layer 2: Data-link layer
- Layer 1: Physical layer
Network firewalls are security devices that operate from Layer 1 to Layer 3, whereas load balancing happens at Layer 4 and Layer 7. Load balancers have different capabilities, including the following:
- Layer 4 (L4)
- Directs traffic based on data from network and transport layer protocols, such as IP address and TCP port.
- Layer 7 (L7)
- Adds context switching to load balancing, allowing routing decisions based on attributes like HTTP header, URL, Secure Sockets Layer (SSL) session ID, and HTML form data.
- Global Server Load Balancing (GSLB)
- Extends L4 and L7 capabilities to servers in different geographic locations. The Domain Name System (DNS) is also used in certain solutions, and this topic is addressedm when Azure Traffic Manager is used as an example of such an implementation.
Demand for more control over the configurability of load balancers and the data plane in general is increasing dramatically as the capabilities become more apparent to more organizations. This demand drives innovation in technology, which has given birth to the world of ADCs.
Problems Load Balancers Solve
Load balancing solves for capacity and availability but also gives way for scalability. These concepts hold true at local and global levels. By balancing load at different layers, we’re able to direct client requests across multiple application servers, between multiple data centers, and over groups of data centers in different regions of the world.
Imagine—or maybe you don’t have to—that your application needs to be performant to users worldwide, which means that it needs to be hosted in multiple geographically separated locations. You use global load balancing to direct a client request to the least latent physical entity representing your application. For availability’s sake, that geographical location is made up of multiple data centers. You use load balancing to distribute loads between those data centers. For capacity’s sake, you have multiple servers within a data center that are able to respond to a given request; if the load for a data center within the geographical location is too much for a single server to handle, you’d then load balance over those different servers.
In this scenario, we have three layers of load balancers to deliver a single globally scalable application. When a server is at peak capacity within a data center, there’s another server to help out. If a data center is at peak capacity or in failure, you have another data center to handle the load. In the case of an entire geographical location being out of service or at peak capacity, the load balancer makes the decision to route the request to another location. No matter the level your application is struggling at, your client’s request will still be fulfilled.
The Solutions Load Balancers Provide
The number of layers of load balancing depends on the needs of your application. How a load balancer determines where to direct a request is based on its own algorithm. The algorithms provided by a load balancer depend on the solution; however, most have a common set to fit your application and client needs:
- Round robin
- The default load-balancing solution, in which requests are distributed through the list of servers sequentially.
- Weighted round robin
- Round robin for situations in which the capacities of the servers vary. The servers with the higher weight are favored and receive a greater share of the traffic.
- Weighted least connections
- Similar to round robin, weighted least connections gives a weighted status to each server; the load balancer routes traffic based on the least number of open connections.
- Hashing
- An algorithm generates a hash key from the header or other information to direct traffic to a specific server.
Other than load distribution, a load balancer can enforce session persistence, also referred to as a sticky session. This involves directing incoming client requests to the same backend server for the duration of a client session. This is a solution to a problem that load balancing presents. A backend server may store data locally for a number of reasons—if, for example, the data set is too large to work with over the network in a timely fashion. In this event, the client will want subsequent requests to be directed to the same backend server—hence session persistence. If a client request is directed to a server that does not have access to the session state data, the client may be logged out or see inconsistent results between requests.
A key feature of load balancers is to monitor the health of a server and to ensure that client requests are not directed to a backend server that is unavailable or unhealthy. A load balancer will either actively or passively monitor its backend servers and can mark them as unhealthy under certain conditions.
Application Delivery and Load Balancing: A Solution Overview
Load balancing and application delivery are interrelated solutions. To understand their relationship and how load balancing is key to application delivery, let’s quickly review the delivery process without an ADC or load balancing:
- An end user/client makes a request to connect with an application on a server. The request is routed over the internet to the application server.
- The server accepts the connection and responds.
- The user/client receives the response to their request.
Figure 1-2 illustrates this process. From the user/client perspective, this was a direct connection. The user asked the application to do something, and it responded.
In an application delivery environment, an ADC sits somewhere between the user/client and the virtual application servers where the requested service resides. The delivery process would look something like the following:
- An end user/client makes a request to connect with an application on a server. The request is routed over the internet to the ADC.
- The ADC decides to accept the connection and then matches the request with the appropriate destination.
- The ADC makes a request to the designated application server.
- The application server responds to the ADC.
- The user/client receives the response to their request.
Figure 1-3 shows the application delivery process with an ADC. From the user/client perspective, this was a direct connection. They would have no indication that any action took place between them and the application. During the transaction, the ADC determines the appropriate application endpoint to respond to the client’s request. The diagram depicts the request being directed to Application A.
In general, an ADC sits between a client and a server. It does not matter whether the client is a user, a web application/service, or another ADC. In the eyes of an ADC, the connection being received is a client of the ADC, and its job is to handle each request according to its configuration.
In regard to redundancy and scale, an ADC is able to balance load over sets of virtual servers that are grouped by application. By combining the routing and control functionality of an ADC with load balancing, we are able to reduce the number of hops a request must take through different layers and services, which in turn optimizes network performance while optimizing availability, reliability, and application responsiveness.
Conclusion
While basic load balancing is still widely used, many web application teams are starting to see their load balancing layer as the prime point at which to add more functionality. Such functionality includes Layer 7 request routing, request and authorization validation, as well as other features that will be covered in this book. The heart of application delivery is the ADC, an advanced load balancer that receives requests and directs them to servers to optimize performance and capacity, just as the heart delivers blood to the body’s organs. Without load balancing, most modern applications would fail. It’s not enough to simply make an application reachable; it also must be dependable, functional, and, most importantly, always available.
As companies move applications from on-premises to the cloud, software architects and cloud solution architects are looking at options to improve application delivery, load balancing, performance, security, and high availability for workloads. This book will provide a meaningful description of application delivery and load-balancing options available natively from Microsoft Azure and of the role NGINX can provide in a comprehensive solution.
In Chapter 2, we’ll explore the managed solutions that are available in Azure, including its native load balancers, application gateway, web application firewalls, and Azure Front Door.
Get Application Delivery and Load Balancing in Microsoft Azure now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.