I recently sat down with Emil Stolarsky, a production engineer at Shopify, to talk about scriptable load balancers and why they’re so useful and powerful. Here are some highlights from our chat.
What is a scriptable load balancer? What benefits does it provide over traditional load balancers?
Up until recently, similar functionality was only accessible to a select few organizations with the resources to develop, from scratch, load balancers that allow for this sort of functionality at the edge (e.g., GCLB, Proxygen). This sort of approach is no longer reserved for the goliaths of our industry. Rather, with the rise of scriptable load balancer projects, the powerful ability to add logic to the edge can now be used by organizations with no more than 4 engineers.
How can scriptable load balancers impact SRE teams?
Currently, site reliability engineers are stuck with a product/service model. An SRE team will be responsible for developing and running a service, and applications are welcome to use it. A common example of this is a database service within an organization (e.g., BigTable, DynamoDB); applications will call out to this service and are thus required to be aware of it. This is overwhelmingly the model used by organizations in our industry.
A different approach is a service-level middleware. Before requests reach the application, they run through a middleware owned by site reliability engineers (e.g., WAF, caching proxy). There are multiple benefits to this approach. Certain difficult problems become easier when they can be solved at a plane above applications. For example, writing a routing service becomes simpler when an application doesn’t have to be self-aware of where it’s running or how to backhaul traffic. Another benefit of service-level middlewares is their transparency to upstream applications. Product developers can free themselves of the overhead of calling to an outside service. Finally, service-level middlewares provide high leverage on investment. They can be developed quickly, with the ability to operate on top of multiple services, achieving impressive scale.
As scriptable load balancers become accessible to smaller teams, they unlock a new suite of powerful tools.
What are some of the infrastructure problems that scriptable load balancers have helped your team solve at Shopify?
Scritable load balancers have been an indispensable tool in scaling Shopify. We started using OpenResty with the move to running Shopify out of multiple data centers. We needed a way to route requests to the correct data center based on the shard it belonged to. The result was Sorting Hat, a shard-aware routing layer that talks to MySQL and proxies requests to the correct upstream.
Scriptable load balancers have also allowed Shopify to handle some of the biggest flash sales. It has allowed us to add a stateless queueing layer before our checkout flow and develop a full-page caching system.
How would you recommend organizations start adopting scriptable load balancers on their way to embracing SRE practices?
I’d recommend starting with OpenResty as it’s the most mature project in the space. Focus on building a good foundation, including a test harness and configuration management before developing middlewares. From there, begin building experience with simple middlewares such as setting request ID headers or tracking response times.