Chapter 1. Introducing Falco

The goal of this first chapter of the book is to explain what Falco is. Don’t worry, we’ll take it easy! We will first look at what Falco does, including a high-level view of its functionality and an introductory description of each of its components. We’ll explore the design principles that inspired Falco and still guide its development today. We’ll then discuss what you can do with Falco, what is outside its domain, and what you can better accomplish with other tools. Finally, we’ll provide some historical context to put things into perspective.

Falco in a Nutshell

At the highest level, Falco is pretty straightforward: you deploy it by installing multiple sensors across a distributed infrastructure. Each sensor collects data (from the local machine or by talking to some API), runs a set of rules against it, and notifies you if something bad happens. Figure 1-1 shows a simplified diagram of how it works.

Figure 1-1. Falco’s high-level architecture

You can think of Falco like a network of security cameras for your infrastructure: you place the sensors in key locations, they observe what’s going on, and they ping you if they detect harmful behavior. With Falco, bad behavior is defined by a set of rules that the community created and maintains for you and that you can customize or extend for your needs. The alerts generated by your fleet of Falco sensors can theoretically stay in the local machine, but in practice they are typically exported to a centralized collector. For centralized alert collection, you can use a general-purpose security information and event management (SIEM) tool or a specialized tool like Falcosidekick. (We’ll cover alert collection extensively in Chapter 12.)

Now let’s dig a little deeper into the Falco architecture and explore its main components, starting with the sensors.

Sensors

Figure 1-2 shows how Falco sensors work.

Figure 1-2. Falco sensor architecture

The sensor consists of an engine that has two inputs: a data source and a set of rules. The sensor applies the rules to each event coming from the data source. When a rule matches an event, an output message is produced. Very straightforward, right?

Data Sources

Each sensor is able to collect input data from a number of sources. Originally, Falco was designed to exclusively operate on system calls, which to date remain one of its most important data sources. We’ll cover system calls in detail in Chapters 3 and 4, but for now you can think of them as what a running program uses to interface with its external world. Opening or closing a file, establishing or receiving a network connection, reading and writing data to and from the disk or the network, executing commands, and communicating with other processes using pipes or other types of interprocess communication are all examples of system call usage.

Falco collects system calls by instrumenting the kernel of the Linux operating system (OS). It can do this in two different ways: deploying a kernel module (i.e., a piece of executable code that can be installed in the operating system kernel to extend the kernel’s functionality) or using a technology called eBPF, which allows running of scripts that safely perform actions inside the OS. We’ll talk extensively about kernel modules and eBPF in Chapter 4.

Tapping into this data gives Falco incredible visibility into everything that is happening in your infrastructure. Here are some examples of things Falco can detect for you:

  • Privilege escalations

  • Access to sensitive data

  • Ownership and mode changes

  • Unexpected network connections or socket mutations

  • Unwanted program execution

  • Data exfiltration

  • Compliance violations

Falco has also been extended to tap into other data sources besides system calls (we’ll show you examples throughout the book). For example, Falco can monitor your cloud logs in real time and notify you when something bad happens in your cloud infrastructure. Here are some more examples of things it can detect for you:

  • When a user logs in without multifactor authentication

  • When a cloud service configuration is modified

  • When somebody accesses one or more sensitive files in an Amazon Web Services (AWS) S3 bucket

New data sources are added to Falco frequently, so we recommend checking the website and Slack channel to keep up with what’s new.

Rules

Rules tell the Falco engine what to do with the data coming from the sources. They allow the user to define policies in a compact and readable format. Falco comes preloaded with a comprehensive set of rules that cover host, container, Kubernetes, and cloud security, and you can easily create your own rules to customize it. We’ll spend a lot of time on rules, in particular in Chapters 7 and 13; by the time you’re done reading this book, you’ll be a total master at them. Here’s an example to whet your appetite:

- rule: shell_in_container
  desc: shell opened inside a container
  condition: spawned_process and container.id != host and proc.name = bash
  output: shell in a container (user=%user.name container_id=%container.id)
  Source: syscall
  priority: WARNING

This rule detects when a bash shell is started inside a container, which is normally not a good thing in an immutable container-based infrastructure. The core entries in a rule are the condition, which tells Falco what to look at, and the output, which is what Falco will tell you when the condition triggers. As you can see, both the condition and the output act on fields, one of the core concepts in Falco. The condition is a Boolean expression that combines checks of fields against values (essentially, a filter). The output is a combination of text and field names, whose values will be printed out in the notification. Its syntax is similar to that of a print statement in a programming language.

Does this remind you of networking tools like tcpdump or Wireshark? Good eye: they were a big inspiration for Falco.

Data Enrichment

Rich data sources and a flexible rule engine help make Falco a powerful runtime security tool. On top of that, metadata from a disparate set of providers enriches its detections.

When Falco tells you that something has happened—for example, that a system file has been modified—you typically need more information to understand the cause and the scope of the issue. Which process did this? Did it happen in a container? If so, what were the container and image names? What was the service/namespace where this happened? Was it in production or in dev? Was this a change made by root?

Falco’s data enrichment engine helps answer all of these questions by building up the environment state, including running processes and threads, the files they have open, the containers and Kubernetes objects they run in, etc. All of this state is accessible to Falco’s rules and outputs. For example, you can easily scope a rule so that it triggers only in production or in a specific service.

Output Channels

Every time a rule is triggered, the corresponding engine emits an output notification. In the simplest possible configuration, the engine writes the notification to standard output (which, as you can imagine, usually isn’t very useful). Fortunately, Falco offers sophisticated ways to route outputs and direct them to a bunch of places, including log collection tools, cloud storage services like S3, and communication tools like Slack and email. Its ecosystem includes a fantastic project called Falcosidekick, specifically designed to connect Falco to the world and make output collection effortless (see Chapter 12 for more on this).

Containers and More

Falco was designed for the modern world of cloud native applications, so it has excellent out-of-the-box support for containers, Kubernetes, and the cloud. Since this book is about cloud native security, we will mostly focus on that, but keep in mind that Falco is not limited to containers and Kubernetes running in the cloud. You can absolutely use it as a host security tool, and many of its preloaded rules can help you secure your fleet of Linux servers. Falco also has good support for network detection, allowing you to inspect the activity of connections, IP addresses, ports, clients, and servers and receive alerts when they show unwanted or unexpected/atypical behavior.

Falco’s Design Principles

Now that you understand what Falco does, let’s talk about why it is the way it is. When you’re developing a piece of software of non-negligible complexity, it’s important to focus on the right use cases and prioritize the most important goals. Sometimes that means accepting trade-offs. Falco is no exception. Its development has been guided by a core set of principles. In this section we will explore why they were chosen and how each of them affects Falco’s architecture and feature set. Understanding these principles will allow you to judge whether Falco is a good fit for your use cases and help you get the most out of it.

Specialized for Runtime

The Falco engine is designed to detect threats while your services and applications are running. When it detects unwanted behavior, Falco should alert you instantly (at most in a matter of seconds) so you’re informed (and can react!) right away, not after minutes or hours have passed.

This design principle manifests in three important architectural choices. First, the Falco engine is engineered as a streaming engine, able to process data quickly as it arrives rather than storing it and acting on it later. Second, it’s designed to evaluate each event independently, not to generate alerts based on a sequence of events; this means correlating different events, even if feasible, is not a primary goal and is in fact discouraged. Third, Falco evaluates rules as close as possible to the data source. If possible, it avoids transporting information before processing it and favors deploying richer engines on the endpoints.

Suitable for Production

You should be able to deploy Falco in any environment, including production environments where stability and low overhead are of paramount importance. It should not crash your apps and should strive to slow them down as little as possible.

This design principle affects the data collection architecture, particularly when Falco runs on endpoints that have many processes or containers. Falco’s drivers (the kernel module and eBPF probe) have undergone many iterations and years of testing to guarantee their performance and stability. Collecting data by tapping into the kernel of the operating system, as opposed to instrumenting the monitored processes/​con⁠tainers, guarantees that your applications won’t crash because of bugs in Falco.

The Falco engine is written in C++ and employs many expedients to reduce resource consumption. For example, it avoids processing system calls that read or write disk or network data. In some ways this is a limitation, because it prevents users from creating rules that inspect the content of payloads, but it also ensures that CPU and memory consumption stay low, which is more important.

Intent-Free Instrumentation

Falco is designed to observe application behavior without requiring users to recompile applications, install libraries, or rebuild containers with monitoring hooks. This is very important in modern containerized environments, where applying changes to every component would require an unrealistic amount of work. It also guarantees that Falco sees every process and container, no matter where it comes from, who runs it, or how long it’s been around.

Optimized to Run at the Edge

Compared to other policy engines (for example, OPA), Falco has been explicitly designed with a distributed, multisensor architecture in mind. Its sensors are designed to be lightweight, efficient, and portable, and to operate in diverse environments. It can be deployed on a physical host, in a virtual machine, or as a container. The Falco binary is built for multiple platforms, including ARM.

Avoids Moving and Storing a Ton of Data

Most currently marketed threat detection products are based on sending large numbers of events to a centralized SIEM tool and then performing analytics on top of the collected data. Falco is designed around a very different principle: stay as close as possible to the endpoint, perform detections in place, and only ship alerts to a centralized collector. This approach results in a solution that is a bit less capable at performing complex analytics, but is simple to operate, much more cost-effective, and scales very well horizontally.

Scalable

Speaking of scale, another important design goal underlying Falco is that it should be able to scale to support the biggest infrastructures in the world. If you can run it, Falco should be able to secure it. As we’ve just described, keeping limited state and avoiding centralized storage are important elements of this. Edge computing is an important element too, since distributing rule evaluation is the only approach to scale a tool like Falco in a truly horizontal way.

Another key part of scalability is endpoint instrumentation. Falco’s data collection stack doesn’t use techniques like sidecars, library linking, or process instrumentation. The reason is that the resource utilization of all of these techniques grows with the number of containers, libraries, or processes to monitor. Busy machines have many containers, libraries, and processes—too many for these techniques to work—but they have only one operating system kernel. Capturing system calls in the kernel means that you need only one Falco sensor per machine, no matter how big the machine is. This makes it possible to run Falco on big hosts with a lot of activity.

Truthful

One other benefit of using system calls as a data source? System calls never lie. Falco is hard to evade because the mechanism it uses to collect data is very difficult to disable or circumvent. If you try to evade or get around it, you will leave traces that Falco can capture.

Robust Defaults, Richly Extensible

Another key design goal was minimizing the time it takes to extract value from Falco. You should be able to do this by just installing it; you shouldn’t need to customize it unless you have advanced requirements.

Whenever the need for customization does arise, though, Falco offers flexibility. For example, you can create new rules through a rich and expressive syntax, develop and deploy new data sources that expand the scope of detections, and integrate Falco with your desired notification and event collection tools.

Simple

Simplicity is the last design choice underpinning Falco, but it’s also one of the most important ones. The Falco rule syntax is designed to be compact, easy to read, and simple to learn. Whenever possible, a Falco rule condition should fit in a single line. Anyone, not only experts, should be able to write a new rule or modify an existing one. It’s OK if this reduces the expressiveness of the syntax: Falco is in the business of delivering an efficient security rule engine, not a full-fledged domain-specific language. There are better tools for that.

Simplicity is also evident in the processes for extending Falco to alert on new data sources and integrating it with a new cloud service or type of container, which is a matter of writing a plugin in any language, including Go, C, and C++. Falco loads these plugins easily, and you can use them to add support for new data sources or new fields to use in rules.

What You Can Do with Falco

Falco shines at detecting threats, intrusions, and data theft at runtime and in real time. It works well with legacy infrastructures but excels at supporting containers, Kubernetes, and cloud infrastructures. It secures both workloads (processes, containers, services) and infrastructure (hosts, VMs, network, cloud infrastructure and services). It is designed to be lightweight, efficient, and scalable and to be used in both development and production. It can detect many classes of threats, but should you need more, you can customize it. It also has a thriving community that supports it and keeps enhancing it.

What You Cannot Do with Falco

No single tool can solve all your problems. Knowing what you cannot do with Falco is as important as knowing where to use it. As with any tool, there are trade-offs. First, Falco is not a general-purpose policy language: it doesn’t offer the expressiveness of a full programming language and cannot perform correlation across different engines. Its rule engine, instead, is designed to apply relatively stateless rules at high frequency in many places around your infrastructure. If you are looking for a powerful centralized policy language, we suggest you take a look at OPA.

Second, Falco is not designed to store the data it collects in a centralized repository so that you can perform analytics on it. Rule validation is performed at the endpoint, and only the alerts are sent to a centralized location. If your focus is advanced analytics and big data querying, we recommend that you use one of the many log collection tools available on the market.

Finally, for efficiency reasons, Falco does not inspect network payloads. Therefore, it’s not the right tool to implement layer 7 (L7) security policies. A traditional network-based intrusion detection system (IDS) or L7 firewall is a better choice for such a use case.

Background and History

The authors of this book have been part of some of Falco’s history, and this final section presents our memories and perspectives. If you are interested only in operationalizing Falco, feel free to skip the rest of this chapter. However, we believe that knowing where Falco comes from can give you useful context for its architecture that will ultimately help you use it better. Plus, it’s a fun story!

Network Packets: BPF, libpcap, tcpdump, and Wireshark

During the height of the late-1990s internet boom, computer networks were exploding in popularity. So was the need to observe, troubleshoot, and secure them. Unfortunately, many operators couldn’t afford the network visibility tools available at that time, which were all commercially offered and very expensive. As a consequence, a lot of people were fumbling around in the dark.

Soon, teams around the world started working on solutions to this problem. Some involved extending existing operating systems to add packet capture functionality: in other words, making it possible to convert an off-the-shelf computer workstation into a device that could sit on a network and collect all the packets sent or received by other workstations. One such solution, Berkeley Packet Filter (BPF), developed by Steven McCanne and Van Jacobson at the University of California at Berkeley, was designed to extend the BSD operating system kernel. If you use Linux, you might be familiar with eBPF, a virtual machine that can be used to safely execute arbitrary code in the Linux kernel (the e stands for extended). eBPF is one of the hottest modern features of the Linux kernel. It’s evolved into an extremely powerful and flexible technology after many years of improvements, but it started as a little programmable packet capture and filtering module for BSD Unix.

BPF came with a library called libpcap that any program could use to capture raw network packets. Its availability triggered a proliferation of networking and security tools. The first tool based on libpcap was a command-line network analyzer called tcpdump, which is still part of virtually any Unix distribution. In 1998, however, a GUI-based open source protocol analyzer called Ethereal (renamed Wireshark in 2006) was launched. It became, and still is, the industry standard for packet analysis.

What tcpdump, Wireshark, and many other popular networking tools have in common is the ability to access a data source that is rich, accurate, and trustworthy and can be collected in a noninvasive way: raw network packets. Keep this concept in mind as you continue reading!

Snort and Packet-Based Runtime Security

Introspection tools like tcpdump and Wireshark were the natural early applications of the BPF packet capture stack. However, people soon started getting creative in their use cases for packets. For example, in 1998, Martin Roesch released an open source network intrusion detection tool called Snort. Snort is a rule engine that processes packets captured from the network. It has a large set of rules that can detect threats and unwanted activity by looking at packets, the protocols they contain, and the payloads they carry. It inspired the creation of similar tools such as Suricata and Zeek.

What makes tools like Snort powerful is their ability to validate the security of networks and applications while applications are running. This is important because it provides real-time protection, and the focus on runtime behavior makes it possible to detect threats based on vulnerabilities that have not yet been disclosed.

The Network Packets Crisis

You’ve just seen what made network packets popular as a data source for visibility, security, and troubleshooting. Applications based on them spawned several successful industries. However, trends arose that eroded packets’ usefulness as a source of truth:

  • Collecting packets in a comprehensive way became more and more complicated, especially in environments like the cloud, where access to routers and network infrastructure is limited.

  • Encryption and network virtualization made it more challenging to extract valuable information.

  • The rise of containers and orchestrators like Kubernetes made infrastructures more elastic. At the same time, it became more complicated to reliably collect network data.

These issues started becoming clear in the early 2010s, with the popularity of cloud computing and containers. Once again, an exciting new ecosystem was unfolding, but no one quite knew how to troubleshoot and secure it.

System Calls as a Data Source: sysdig

That’s where your authors come in. We released an open source tool called sysdig, which we were inspired to build by a set of questions: What is the best way to provide visibility for modern cloud native applications? Can we apply workflows built on top of packet capture to this new world? What is the best data source?

sysdig originally focused on collecting system calls from the kernel of the operating system. System calls are a rich data source—even richer than packets—because they don’t exclusively focus on network data: they include file I/O, command execution, interprocess communication, and more. They are a better data source for cloud native environments than packets, because they can be collected from the kernel for both containers and cloud instances. Plus, collecting them is easy, efficient, and minimally invasive.

sysdig was initially composed of three separate components:

  • A kernel capture probe (available in two flavors, kernel module and eBPF)

  • A set of libraries to facilitate the development of capture programs

  • A command-line tool with decoding and filtering capabilities

In other words, it was porting the BPF stack to system calls. sysdig was engineered to support the most popular network packet workflows: trace files, easy filtering, scriptability, and so on. From the beginning, we also included native integrations with Kubernetes and other orchestrators, with the goal of making them useful in modern environments. sysdig immediately became very popular with the community, validating the technical approach.

Falco

So what would be the next logical step? You guessed it: a Snort-like tool for system calls! A flexible rule engine on top of the sysdig libraries, we thought, would be a powerful tool to detect anomalous behavior and intrusions in modern apps reliably and efficiently—essentially the Snort approach but applied to system calls and designed to work in the cloud.

So, that’s how Falco was born. The first (rather simple) version was released at the end of 2016 and included most of the important components, such as the rule engine. Falco’s rule engine was inspired by Snort’s but designed to operate on a much richer and more generic dataset and was plugged into the sysdig libraries. It shipped with a relatively small but useful set of rules. This initial version of Falco was largely a single-machine tool, with no ability to be deployed in a distributed way. We released it as open source because we saw a broad community need for it and, of course, because we love open source!

Expanding into Kubernetes

As the tool evolved and the community embraced it, Falco’s developers expanded it into new domains of applicability. For example, in 2018 we added Kubernetes audit logs as a data source. This feature lets Falco tap into the stream of events produced by the audit log and detect misconfigurations and threats as they happen.

Creating this feature required us to improve the engine, which made Falco more flexible and better suited to a broader range of use cases.

Joining the Cloud Native Computing Foundation

In 2018 Sysdig contributed Falco to the Cloud Native Computing Foundation (CNCF) as a sandbox project. The CNCF is the home of many important projects at the foundation of modern cloud computing, such as Kubernetes, Prometheus, Envoy, and OPA. For our team, making Falco part of the CNCF was a way to evolve it into a truly community-driven effort, to make sure it would be flawlessly integrated with the rest of the cloud native stack, and to guarantee long-term support for it. In 2021 this effort was expanded by the contribution of the sysdig kernel module, eBPF probe, and libraries to the CNCF, as a subproject in the Falco organization. The full Falco stack is now in the hands of a neutral and caring community.

Plugins and the cloud

As years passed and Falco matured, a couple of things became clear. First, its sophisticated engine, efficient nature, and ease of deployment make it suitable for much more than system call–based runtime security. Second, as software becomes more and more distributed and complex, runtime security is paramount to immediately detecting threats, both expected and unexpected. Finally, we believe that the world needs a consistent, standardized way to approach runtime security. In particular, there is great demand for a solution that can protect workloads (processes, containers, services, applications) and infrastructure (hosts, networks, cloud services) in a converged way.

As a consequence, the next step in the evolution of Falco was adding modularity, flexibility, and support for many more data sources spanning different domains. For example, in 2021 a new plugin infrastructure was added that allows Falco to tap into data sources like cloud provider logs to detect misconfigurations, unauthorized access, data theft, and much more.

A long journey

Falco’s story stretches across more than two decades and links many people, inventions, and projects that at first glance don’t appear related. In our opinion, this story exemplifies why open source is so cool: becoming a contributor lets you learn from the smart people who came before you, build on top of their innovations, and connect communities in creative ways.

Get Practical Cloud Native Security with Falco now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.