eBPF and systems performance
Five questions for Brendan Gregg about improving the performance of Linux systems.
I recently sat down with Brendan Gregg, senior performance architect at Netflix, to talk about Linux system performance and ways to improve it. Here are some highlights from our chat.
What is eBPF and why is it useful?
eBPF is a weird Linux kernel technology that powers low-overhead custom analysis tools, which can be run in production to find performance wins that no other tool can. With it, we can pull out millions of new metrics from the kernel and applications, and explore running software like never before. It’s a superpower. It’ll benefit many people on Linux as they’ll add a toolkit of new analysis tools, or use new plugins for deep monitoring. That’s what I’ll show in my Velocity talk: new tools you can use.
eBPF is short for “enhanced Berkeley Packet Filter,” which is an in-kernel virtual machine originally used to run mini-filter programs efficiently, specifically the filter expressions of tcpdump. People realized that having a virtual machine in the kernel that can safely run user-defined programs has uses beyond packet filtering.
I’m using eBPF for Linux tracing, and with the Linux static and dynamic tracers (kprobes and uprobes). Now there’s been plenty of tracers for Linux before, so you might wonder why eBPF is different. One reason is that it’s been integrated into the Linux kernel, so it’s not a third party add-on; if you’re on a modern kernel you may already have it available on your systems. Another reason is that it’s programmatic. The other built-in tracers in Linux were used in a trace, dump, and post-process manner, where they would dump fixed event details. For frequent events, like scheduling activity, the overheads of dumping and post-processing all events can get too high. eBPF, on the other hand, can summarize data in kernel context, and only emit the summary you care about to user level—for example, latency histograms of file system I/O.
What do engineers need to understand about systems performance these days?
Engineers need to know that the operating system and kernel can make a great platform for application analysis, in addition to application specific tools. Some things are easier to debug from the kernel, like how file systems and storage devices are performing, how the scheduler may be blocking the application, and bottlenecks in networking. The kernel can also examine areas that the application is blind to.
While there are many performance tools and metrics from the system, the most essential are CPU flame graphs and off-CPU flame graphs. CPU flame graphs explain why any code is executing on-CPU, and eBPF provides a way to capture this efficiently: frequently counting stack traces in kernel context. Off-CPU flame graphs visualize why applications are blocking and not running, whether that’s due to disk or network I/O, run queue latency, or really anything. They haven’t been practical to use before eBPF, since they involve tracing frequent scheduler events, which really benefits from the eBPF in-kernel summarization.
What are some of the more exciting trends (or projects) you see in the Linux performance space?
Other uses of eBPF are exciting. There’s the eXpress Data Path project added to Linux 4.8 to use eBPF for a fast lane through the TCP/IP stack and for DDOS protection. That’s already in use by some major companies. Other exciting projects include the BBR TCP congestion control algorithm added in Linux 4.9, many new features in the Linux perf tool in recent versions, and enhancements to cgroups.
There are also interesting developments with how we’re running Linux. Containers provide more flexibility for controlling resource usage via cgroups, and provide a low overhead environment that’s easy to configure. Hardware virtualization technologies, including Xen, are getting faster too. A major cloud provider just launched an instance type with SR-IOV for a direct path to disk performance, and already has that for networking. Much about what we knew and assumed with these virtualization technologies is changing for the better.
What recommendations do you have for engineers who want to get started with Linux systems engineering?
The most important thing is to get exposure to what’s possible—turning your unknown unknowns into known unknowns. Systems performance analysis is a big field, however, and for many engineers, performance analysis using OS tools would only be a fraction of their already busy jobs. An efficient way to get started and get that exposure is to catch some conference talks on the topic.
You’re speaking at the O’Reilly Velocity Conference in San Jose this June. What presentations are you looking forward to attending while there?
It’s a tough choice as there are some great engineers from Netflix speaking, including a keynote by Dianne Marsh. I’ll also probably be ducking into talks like “Performance in a hyperscaling world”, “Scheduling deep dive for orchestration systems”, “The problem with preaggregated metrics”, “PinTrace: A distributed tracing pipeline”, and “Our many monitoring monsters”.