Load balancing is the act of distributing network traffic across a group of servers; a load balancer is a server that performs this action. Load balancing serves as a solution to hardware and software performance. Here you will learn about the problems load balancing solves and how load balancing has evolved.
There are three important problem domains that load balancers were made to address: performance, availability, and economy.
As early computing and internet pioneers found, there are physical bounds to how much work a computer can do in a given amount of time. Luckily, these physical bounds increase at a seemingly exponential rate. However, the public’s demand for quick complicated software is constantly pushing the bounds of machines, because we’re piling hundreds to millions of users onto them. This is the performance problem.
Machine failure happens. You should avoid single points of failure whenever possible. This means that machines should have replicas. When you have replicas of servers, a machine failure is not a complete failure of your application. During a machine failure event, your customer should notice as little as possible. This is the availability problem: to avoid outages due to hardware failure, we need to run multiple machines, and be able to reroute traffic away from offline systems as fast as possible.
Now you could buy the latest and greatest machine every year to keep up with the growing demand of your user base, and you could buy a second one to protect yourself from assured failure, but this gets expensive. There are some cases where scaling vertically is the right choice, but for the vast majority of web application workloads it’s not an economical procurement choice. The more relative power a machine has for the time in which it’s released, the more of a premium will be charged for its capacity.
These adversities spawned the need for distributing workloads over multiple machines. All of your users want what your services provide to be fast and reliable, and you want to provide them quality service with the highest return on investment. Load balancers help solve the performance, economy, and availability problems. Let’s look at how.
When faced with mounting demand from users, and maxing out the performance of the machine hosting your service, you have two options: scale up or scale out. Scaling up (i.e., vertical scaling) has physical computational limits. Scaling out (i.e., horizontal scaling) allows you to distribute the computational load across as many systems as necessary to handle the workload. When scaling out, a load balancer can help distribute the workload among an array of servers, while also allowing capacity to be added or removed as necessary.
You’ve probably heard the saying “Don’t put all your eggs in one basket.” This applies to your application stack as well. Any application in production should have a disaster strategy for as many failure types as you can think of. The best way to ensure that a failure isn’t a disaster is to have redundancy and an automatic recovery mechanism. Load balancing enables this type of strategy. Multiple machines are live at all times; if one fails it’s just a fraction of your capacity.
In regards to cost, load balancing also offers economic solutions. Deploying a large server can be more expensive than using a pool of smaller ones. It’s also cheaper and easier to add a small node to a pool than to upgrade and replace a large one. Most importantly, the protection against disasters strengthens your brand’s reliability image, which is priceless.
The ability to disperse load between multiple machines solves important performance issues, which is why load balancers continue to evolve.
Load balancers have come a long way since their inception. One way to load balance is through the Domain Name System (DNS), which would be considered client side. Another would be to load balance on the server side, where traffic passes through a load balancing device that distributes load over a pool of servers. Both ways are valid, but DNS and client side load balancing is limited, and should be used with caution because DNS records are cached according to their time-to-live (TTL) attribute, and that will lead your client to non-operating nodes and produce a delay after changes. Server-side load balancing is powerful, it can provide fine-grain control, and enable immediate change to the interaction between client and application. This book will mainly cover server-side load balancing.
Server-side load balancers have evolved from simply routing packets, to being fully application aware. These are the two types of load balancers known as network load balancers and application load balancers. Both named with respect to the layer of the OSI model to which they operate.
The application load balancers are where there are interesting advancements. Because the load balancer is able to understand the packet at the application level, it has more context to the way it balances and routes traffic. Load balancers have also advanced in the variety of features that they provide. Being in line with the presentation of the application, an application load balancer is a great place to add another layer of security, or cache requests to lower response times.
Even as load balancers have evolved, earlier “network layer” load balancers remain relevant even as newer “application layer” load balancers have also become useful. Network load balancers are great for simply and quickly distributing load. Application load balancers are important for routing specifics, such as session persistence and presentation. Later in this book, you will learn how all of these types of load balancing techniques work together to serve your goal of a highly performant, secure, and reliable application.