Chapter 4. eBPF Complexity

You’ve now seen an example of eBPF programming to give you a flavor of how it works. While basic examples can make eBPF seem relatively straightforward, there are some complexities that make it challenging.

One area that has historically made it relatively difficult to write and distribute eBPF programs is kernel compatibility.

Portability Across Kernels

eBPF programs can access kernel data structures, and these may change across different kernel versions. The structures themselves are defined in header files that form part of the Linux source code. Back in the day, you had to compile your eBPF programs against the correct set of header files compatible with the kernel where you want to run those programs.

BCC Approach to Portability

To address portability across kernels, the BCC1 (BPF Compiler Collection) project took the approach of compiling eBPF code at runtime, in situ on the destination machine. This means the compilation toolchain needs to be installed onto every destination machine where you want the code to run,2 and you have to wait for the compilation to complete before the tool starts. You also have to hope that the kernel headers are present on the filesystem (and that’s not always the case). Enter BPF CO-RE.

CO-RE

The CO-RE—compile once, run everywhere—approach consists of a few elements:

BTF (BPF Type Format)

This is a format for expressing the layout of data structures and function signatures. Modern Linux kernels support BTF, so that you can generate a header file called vmlinux.h from a running system, containing all the data structure information about a kernel that a BPF program might need.

libbpf, the BPF library

On the one hand, libbpf provides functions for loading eBPF programs and maps into the kernel. But it also plays an important role in portability: it leans on BTF information to adjust the eBPF code to compensate for any differences between the data structures present when it was compiled, and what’s on the destination machine.

Compiler support

The clang compiler was enhanced so that when it compiles eBPF programs, it includes what are known as BTF relocations, which are what libbpf uses to know what to adjust as it loads BPF programs and maps into the kernel.

Optionally, a BPF skeleton

A skeleton can be autogenerated from a compiled BPF object file using bpftool gen skeleton, containing handy functions that user space code can call to manage the lifecycle of BPF programs—loading them into the kernel, attaching them to events and so on. These functions are higher-level abstractions that can be more convenient for the developer than using libbpf directly.

For a more detailed explanation of CO-RE, read Andrii Nakryiko’s excellent description.

BTF information in the form of a vmlinux file has been included in the Linux kernel since version 5.4,3 but raw BTF data that libbpf can make use of can also be generated for older kernels. There’s information on how to generate BTF files, and an archive of files for a variety of Linux distributions, on the BTF Hub.

The BPF CO-RE approach makes it far easier than it used to be for an eBPF programmer to get their code to run on any Linux distribution—or at least, on any Linux distribution new enough to have support for whatever set of eBPF capabilities their program uses. But this doesn’t make eBPF programming a walk in the park: it’s still essentially kernel programming.

Linux Kernel Knowledge

It quite quickly becomes clear that you need some domain knowledge about the Linux kernel in order to write more advanced tools. You’ll need to understand the data structures you have access to, which depend on the context in which your eBPF code is called. Not every application developer has experience in parsing network packets, accessing socket buffers, or handling the arguments to a system call.

How will the kernel react to your eBPF code’s behavior? As you learned in Chapter 2, the kernel consists of millions of lines of code. Its documentation can be sparse, so you might find yourself having to read kernel source code to figure out how something works.

You’ll also need to figure out what events your eBPF code should attach to. With the option to attach a kprobe to any function entry point in the entire kernel, it might not be an easy decision. In some cases, it’s straightforward—for example, if you want to access an incoming network packet, then the XDP hook on the appropriate network interface is an obvious choice. If you want to provide observability into a particular kernel event, it may not be terribly hard to find the appropriate point within the kernel code.

But in other cases, the choice may be less obvious. As an example, tools that simply use kprobes to hook into the functions that make up the kernel’s syscall interface may be subject to a security exploit known as a time-of-check to time-of-use (TOCTTOU). An attacker has a small window of opportunity where they can change a syscall’s arguments after the eBPF code has read them, but before they have been copied into kernel memory. There was an excellent presentation on this at DEF CON 294 by Rex Guo and Junyuan Zeng. Some of the most widely used eBPF tooling was written in quite a naive way and is subject to this kind of attack. It’s not an easy exploit, and there are ways to mitigate these attacks, but if you’re protecting highly sensitive data against sophisticated, motivated adversaries, please dig in to understand whether the tools you use might be affected.

You’ve already seen how BPF CO-RE enables eBPF programs to work on different kernel versions, but it only takes into account the changes in data structure layout and not broader changes to kernel behavior. For example, if you want to attach an eBPF program to a particular function or tracepoint in the kernel, you may need a Plan B for what to do if that function or tracepoint doesn’t exist in a different kernel version.

Coordinating Multiple eBPF Programs

A lot of eBPF-based tools available today offer a suite of observability capabilities, enabled by hooking eBPF programs into a set of kernel events. Much of this was pioneered by the work that Brendan Gregg and others did in BCC and bpftrace tools. Today’s generation of (often commercial) tools may offer much prettier graphics and UIs, but the eBPF programs they leverage are based highly on those originals.

Things get considerably more complicated when you want to write code that coordinates interactions between different types of events. As an example, Cilium sees network packets at a variety of points through the kernel’s networking stack,5 and manipulates traffic based on information from the Kubernetes CNI (container network interface) about Kubernetes pods. Building this system requires Cilium developers to have an in-depth understanding of how the kernel handles network traffic, and how the user space concepts of “pods” and “containers” map to kernel concepts like cgroups and namespaces. In practice, several Cilium maintainers are also kernel developers working on enhancements to eBPF and to networking support; hence, they have this knowledge.

The bottom line is that although eBPF offers an extremely efficient and powerful platform for hooking into the kernel, it’s nontrivial for the average developer without significant kernel experience. If you’re interested in getting your hands dirty with eBPF programming, I highly encourage it as a learning exercise; building up experience in this area could be highly valuable since it’s bound to continue to be a sought-after specialist skill for years to come. But realistically, most organizations are unlikely to build much bespoke eBPF tooling in-house, but instead will leverage projects and products from the specialist eBPF community.

Let’s move on to considering why these eBPF-based projects and products are particularly powerful in a cloud native environment.

1 You’ll find BCC at this GitHub page.

2 Some projects take the approach of packaging the eBPF source plus the required toolchain into a container image. This avoids the complexity of installing that toolchain and any concomitant dependency management, but it still means that the compilation step runs on the destination machine.

3 See Andrii Nakryiko’s IO Visor post for more information.

4 Rex Guo and Junyuan Zeng, “Phantom Attack: Evading System Call Monitoring,” (DEF CON, August 5–8, 2021).

5 The Cilium documentation describes how eBPF programs attached to different networking hooks are combined to achieve complex networking capabilities.

Get What Is eBPF? now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.