Chapter 1. Introducing Falco
The goal of this first chapter of the book is to explain what Falco is. Donât worry, weâll take it easy! We will first look at what Falco does, including a high-level view of its functionality and an introductory description of each of its components. Weâll explore the design principles that inspired Falco and still guide its development today. Weâll then discuss what you can do with Falco, what is outside its domain, and what you can better accomplish with other tools. Finally, weâll provide some historical context to put things into perspective.
Falco in a Nutshell
At the highest level, Falco is pretty straightforward: you deploy it by installing multiple sensors across a distributed infrastructure. Each sensor collects data (from the local machine or by talking to some API), runs a set of rules against it, and notifies you if something bad happens. Figure 1-1 shows a simplified diagram of how it works.
You can think of Falco like a network of security cameras for your infrastructure: you place the sensors in key locations, they observe whatâs going on, and they ping you if they detect harmful behavior. With Falco, bad behavior is defined by a set of rules that the community created and maintains for you and that you can customize or extend for your needs. The alerts generated by your fleet of Falco sensors can theoretically stay in the local machine, but in practice they are typically exported to a centralized collector. For centralized alert collection, you can use a general-purpose security information and event management (SIEM) tool or a specialized tool like Falcosidekick. (Weâll cover alert collection extensively in Chapter 12.)
Now letâs dig a little deeper into the Falco architecture and explore its main components, starting with the sensors.
Sensors
Figure 1-2 shows how Falco sensors work.
The sensor consists of an engine that has two inputs: a data source and a set of rules. The sensor applies the rules to each event coming from the data source. When a rule matches an event, an output message is produced. Very straightforward, right?
Data Sources
Each sensor is able to collect input data from a number of sources. Originally, Falco was designed to exclusively operate on system calls, which to date remain one of its most important data sources. Weâll cover system calls in detail in Chapters 3 and 4, but for now you can think of them as what a running program uses to interface with its external world. Opening or closing a file, establishing or receiving a network connection, reading and writing data to and from the disk or the network, executing commands, and communicating with other processes using pipes or other types of interprocess communication are all examples of system call usage.
Falco collects system calls by instrumenting the kernel of the Linux operating system (OS). It can do this in two different ways: deploying a kernel module (i.e., a piece of executable code that can be installed in the operating system kernel to extend the kernelâs functionality) or using a technology called eBPF, which allows running of scripts that safely perform actions inside the OS. Weâll talk extensively about kernel modules and eBPF in Chapter 4.
Tapping into this data gives Falco incredible visibility into everything that is happening in your infrastructure. Here are some examples of things Falco can detect for you:
-
Privilege escalations
-
Access to sensitive data
-
Ownership and mode changes
-
Unexpected network connections or socket mutations
-
Unwanted program execution
-
Data exfiltration
-
Compliance violations
Falco has also been extended to tap into other data sources besides system calls (weâll show you examples throughout the book). For example, Falco can monitor your cloud logs in real time and notify you when something bad happens in your cloud infrastructure. Here are some more examples of things it can detect for you:
-
When a user logs in without multifactor authentication
-
When a cloud service configuration is modified
-
When somebody accesses one or more sensitive files in an Amazon Web Services (AWS) S3 bucket
New data sources are added to Falco frequently, so we recommend checking the website and Slack channel to keep up with whatâs new.
Rules
Rules tell the Falco engine what to do with the data coming from the sources. They allow the user to define policies in a compact and readable format. Falco comes preloaded with a comprehensive set of rules that cover host, container, Kubernetes, and cloud security, and you can easily create your own rules to customize it. Weâll spend a lot of time on rules, in particular in Chapters 7 and 13; by the time youâre done reading this book, youâll be a total master at them. Hereâs an example to whet your appetite:
-
rule
:
shell_in_container
desc
:
shell opened inside a container
condition
:
spawned_process and container.id != host and proc.name = bash
output
:
shell in a container (user=%user.name container_id=%container.id)
Source
:
syscall
priority
:
WARNING
This rule detects when a bash shell is started inside a container, which is normally not a good thing in an immutable container-based infrastructure. The core entries in a rule are the condition, which tells Falco what to look at, and the output, which is what Falco will tell you when the condition triggers. As you can see, both the condition and the output act on fields, one of the core concepts in Falco. The condition is a Boolean expression that combines checks of fields against values (essentially, a filter). The output is a combination of text and field names, whose values will be printed out in the notification. Its syntax is similar to that of a print
statement in a programming language.
Does this remind you of networking tools like tcpdump or Wireshark? Good eye: they were a big inspiration for Falco.
Data Enrichment
Rich data sources and a flexible rule engine help make Falco a powerful runtime security tool. On top of that, metadata from a disparate set of providers enriches its detections.
When Falco tells you that something has happenedâfor example, that a system file has been modifiedâyou typically need more information to understand the cause and the scope of the issue. Which process did this? Did it happen in a container? If so, what were the container and image names? What was the service/namespace where this happened? Was it in production or in dev? Was this a change made by root?
Falcoâs data enrichment engine helps answer all of these questions by building up the environment state, including running processes and threads, the files they have open, the containers and Kubernetes objects they run in, etc. All of this state is accessible to Falcoâs rules and outputs. For example, you can easily scope a rule so that it triggers only in production or in a specific service.
Output Channels
Every time a rule is triggered, the corresponding engine emits an output notification. In the simplest possible configuration, the engine writes the notification to standard output (which, as you can imagine, usually isnât very useful). Fortunately, Falco offers sophisticated ways to route outputs and direct them to a bunch of places, including log collection tools, cloud storage services like S3, and communication tools like Slack and email. Its ecosystem includes a fantastic project called Falcosidekick, specifically designed to connect Falco to the world and make output collection effortless (see Chapter 12 for more on this).
Containers and More
Falco was designed for the modern world of cloud native applications, so it has excellent out-of-the-box support for containers, Kubernetes, and the cloud. Since this book is about cloud native security, we will mostly focus on that, but keep in mind that Falco is not limited to containers and Kubernetes running in the cloud. You can absolutely use it as a host security tool, and many of its preloaded rules can help you secure your fleet of Linux servers. Falco also has good support for network detection, allowing you to inspect the activity of connections, IP addresses, ports, clients, and servers and receive alerts when they show unwanted or unexpected/atypical behavior.
Falcoâs Design Principles
Now that you understand what Falco does, letâs talk about why it is the way it is. When youâre developing a piece of software of non-negligible complexity, itâs important to focus on the right use cases and prioritize the most important goals. Sometimes that means accepting trade-offs. Falco is no exception. Its development has been guided by a core set of principles. In this section we will explore why they were chosen and how each of them affects Falcoâs architecture and feature set. Understanding these principles will allow you to judge whether Falco is a good fit for your use cases and help you get the most out of it.
Specialized for Runtime
The Falco engine is designed to detect threats while your services and applications are running. When it detects unwanted behavior, Falco should alert you instantly (at most in a matter of seconds) so youâre informed (and can react!) right away, not after minutes or hours have passed.
This design principle manifests in three important architectural choices. First, the Falco engine is engineered as a streaming engine, able to process data quickly as it arrives rather than storing it and acting on it later. Second, itâs designed to evaluate each event independently, not to generate alerts based on a sequence of events; this means correlating different events, even if feasible, is not a primary goal and is in fact discouraged. Third, Falco evaluates rules as close as possible to the data source. If possible, it avoids transporting information before processing it and favors deploying richer engines on the endpoints.
Suitable for Production
You should be able to deploy Falco in any environment, including production environments where stability and low overhead are of paramount importance. It should not crash your apps and should strive to slow them down as little as possible.
This design principle affects the data collection architecture, particularly when Falco runs on endpoints that have many processes or containers. Falcoâs drivers (the kernel module and eBPF probe) have undergone many iterations and years of testing to guarantee their performance and stability. Collecting data by tapping into the kernel of the operating system, as opposed to instrumenting the monitored processes/âconâ tainers, guarantees that your applications wonât crash because of bugs in Falco.
The Falco engine is written in C++ and employs many expedients to reduce resource consumption. For example, it avoids processing system calls that read or write disk or network data. In some ways this is a limitation, because it prevents users from creating rules that inspect the content of payloads, but it also ensures that CPU and memory consumption stay low, which is more important.
Intent-Free Instrumentation
Falco is designed to observe application behavior without requiring users to recompile applications, install libraries, or rebuild containers with monitoring hooks. This is very important in modern containerized environments, where applying changes to every component would require an unrealistic amount of work. It also guarantees that Falco sees every process and container, no matter where it comes from, who runs it, or how long itâs been around.
Optimized to Run at the Edge
Compared to other policy engines (for example, OPA), Falco has been explicitly designed with a distributed, multisensor architecture in mind. Its sensors are designed to be lightweight, efficient, and portable, and to operate in diverse environments. It can be deployed on a physical host, in a virtual machine, or as a container. The Falco binary is built for multiple platforms, including ARM.
Avoids Moving and Storing a Ton of Data
Most currently marketed threat detection products are based on sending large numbers of events to a centralized SIEM tool and then performing analytics on top of the collected data. Falco is designed around a very different principle: stay as close as possible to the endpoint, perform detections in place, and only ship alerts to a centralized collector. This approach results in a solution that is a bit less capable at performing complex analytics, but is simple to operate, much more cost-effective, and scales very well horizontally.
Scalable
Speaking of scale, another important design goal underlying Falco is that it should be able to scale to support the biggest infrastructures in the world. If you can run it, Falco should be able to secure it. As weâve just described, keeping limited state and avoiding centralized storage are important elements of this. Edge computing is an important element too, since distributing rule evaluation is the only approach to scale a tool like Falco in a truly horizontal way.
Another key part of scalability is endpoint instrumentation. Falcoâs data collection stack doesnât use techniques like sidecars, library linking, or process instrumentation. The reason is that the resource utilization of all of these techniques grows with the number of containers, libraries, or processes to monitor. Busy machines have many containers, libraries, and processesâtoo many for these techniques to workâbut they have only one operating system kernel. Capturing system calls in the kernel means that you need only one Falco sensor per machine, no matter how big the machine is. This makes it possible to run Falco on big hosts with a lot of activity.
Robust Defaults, Richly Extensible
Another key design goal was minimizing the time it takes to extract value from Falco. You should be able to do this by just installing it; you shouldnât need to customize it unless you have advanced requirements.
Whenever the need for customization does arise, though, Falco offers flexibility. For example, you can create new rules through a rich and expressive syntax, develop and deploy new data sources that expand the scope of detections, and integrate Falco with your desired notification and event collection tools.
Simple
Simplicity is the last design choice underpinning Falco, but itâs also one of the most important ones. The Falco rule syntax is designed to be compact, easy to read, and simple to learn. Whenever possible, a Falco rule condition should fit in a single line. Anyone, not only experts, should be able to write a new rule or modify an existing one. Itâs OK if this reduces the expressiveness of the syntax: Falco is in the business of delivering an efficient security rule engine, not a full-fledged domain-specific language. There are better tools for that.
Simplicity is also evident in the processes for extending Falco to alert on new data sources and integrating it with a new cloud service or type of container, which is a matter of writing a plugin in any language, including Go, C, and C++. Falco loads these plugins easily, and you can use them to add support for new data sources or new fields to use in rules.
What You Can Do with Falco
Falco shines at detecting threats, intrusions, and data theft at runtime and in real time. It works well with legacy infrastructures but excels at supporting containers, Kubernetes, and cloud infrastructures. It secures both workloads (processes, containers, services) and infrastructure (hosts, VMs, network, cloud infrastructure and services). It is designed to be lightweight, efficient, and scalable and to be used in both development and production. It can detect many classes of threats, but should you need more, you can customize it. It also has a thriving community that supports it and keeps enhancing it.
What You Cannot Do with Falco
No single tool can solve all your problems. Knowing what you cannot do with Falco is as important as knowing where to use it. As with any tool, there are trade-offs. First, Falco is not a general-purpose policy language: it doesnât offer the expressiveness of a full programming language and cannot perform correlation across different engines. Its rule engine, instead, is designed to apply relatively stateless rules at high frequency in many places around your infrastructure. If you are looking for a powerful centralized policy language, we suggest you take a look at OPA.
Second, Falco is not designed to store the data it collects in a centralized repository so that you can perform analytics on it. Rule validation is performed at the endpoint, and only the alerts are sent to a centralized location. If your focus is advanced analytics and big data querying, we recommend that you use one of the many log collection tools available on the market.
Finally, for efficiency reasons, Falco does not inspect network payloads. Therefore, itâs not the right tool to implement layer 7 (L7) security policies. A traditional network-based intrusion detection system (IDS) or L7 firewall is a better choice for such a use case.
Background and History
The authors of this book have been part of some of Falcoâs history, and this final section presents our memories and perspectives. If you are interested only in operationalizing Falco, feel free to skip the rest of this chapter. However, we believe that knowing where Falco comes from can give you useful context for its architecture that will ultimately help you use it better. Plus, itâs a fun story!
Network Packets: BPF, libpcap, tcpdump, and Wireshark
During the height of the late-1990s internet boom, computer networks were exploding in popularity. So was the need to observe, troubleshoot, and secure them. Unfortunately, many operators couldnât afford the network visibility tools available at that time, which were all commercially offered and very expensive. As a consequence, a lot of people were fumbling around in the dark.
Soon, teams around the world started working on solutions to this problem. Some involved extending existing operating systems to add packet capture functionality: in other words, making it possible to convert an off-the-shelf computer workstation into a device that could sit on a network and collect all the packets sent or received by other workstations. One such solution, Berkeley Packet Filter (BPF), developed by Steven McCanne and Van Jacobson at the University of California at Berkeley, was designed to extend the BSD operating system kernel. If you use Linux, you might be familiar with eBPF, a virtual machine that can be used to safely execute arbitrary code in the Linux kernel (the e stands for extended). eBPF is one of the hottest modern features of the Linux kernel. Itâs evolved into an extremely powerful and flexible technology after many years of improvements, but it started as a little programmable packet capture and filtering module for BSD Unix.
BPF came with a library called libpcap that any program could use to capture raw network packets. Its availability triggered a proliferation of networking and security tools. The first tool based on libpcap was a command-line network analyzer called tcpdump, which is still part of virtually any Unix distribution. In 1998, however, a GUI-based open source protocol analyzer called Ethereal (renamed Wireshark in 2006) was launched. It became, and still is, the industry standard for packet analysis.
What tcpdump, Wireshark, and many other popular networking tools have in common is the ability to access a data source that is rich, accurate, and trustworthy and can be collected in a noninvasive way: raw network packets. Keep this concept in mind as you continue reading!
Snort and Packet-Based Runtime Security
Introspection tools like tcpdump and Wireshark were the natural early applications of the BPF packet capture stack. However, people soon started getting creative in their use cases for packets. For example, in 1998, Martin Roesch released an open source network intrusion detection tool called Snort. Snort is a rule engine that processes packets captured from the network. It has a large set of rules that can detect threats and unwanted activity by looking at packets, the protocols they contain, and the payloads they carry. It inspired the creation of similar tools such as Suricata and Zeek.
What makes tools like Snort powerful is their ability to validate the security of networks and applications while applications are running. This is important because it provides real-time protection, and the focus on runtime behavior makes it possible to detect threats based on vulnerabilities that have not yet been disclosed.
The Network Packets Crisis
Youâve just seen what made network packets popular as a data source for visibility, security, and troubleshooting. Applications based on them spawned several successful industries. However, trends arose that eroded packetsâ usefulness as a source of truth:
-
Collecting packets in a comprehensive way became more and more complicated, especially in environments like the cloud, where access to routers and network infrastructure is limited.
-
Encryption and network virtualization made it more challenging to extract valuable information.
-
The rise of containers and orchestrators like Kubernetes made infrastructures more elastic. At the same time, it became more complicated to reliably collect network data.
These issues started becoming clear in the early 2010s, with the popularity of cloud computing and containers. Once again, an exciting new ecosystem was unfolding, but no one quite knew how to troubleshoot and secure it.
System Calls as a Data Source: sysdig
Thatâs where your authors come in. We released an open source tool called sysdig, which we were inspired to build by a set of questions: What is the best way to provide visibility for modern cloud native applications? Can we apply workflows built on top of packet capture to this new world? What is the best data source?
sysdig originally focused on collecting system calls from the kernel of the operating system. System calls are a rich data sourceâeven richer than packetsâbecause they donât exclusively focus on network data: they include file I/O, command execution, interprocess communication, and more. They are a better data source for cloud native environments than packets, because they can be collected from the kernel for both containers and cloud instances. Plus, collecting them is easy, efficient, and minimally invasive.
sysdig was initially composed of three separate components:
-
A kernel capture probe (available in two flavors, kernel module and eBPF)
-
A set of libraries to facilitate the development of capture programs
-
A command-line tool with decoding and filtering capabilities
In other words, it was porting the BPF stack to system calls. sysdig was engineered to support the most popular network packet workflows: trace files, easy filtering, scriptability, and so on. From the beginning, we also included native integrations with Kubernetes and other orchestrators, with the goal of making them useful in modern environments. sysdig immediately became very popular with the community, validating the technical approach.
Falco
So what would be the next logical step? You guessed it: a Snort-like tool for system calls! A flexible rule engine on top of the sysdig libraries, we thought, would be a powerful tool to detect anomalous behavior and intrusions in modern apps reliably and efficientlyâessentially the Snort approach but applied to system calls and designed to work in the cloud.
So, thatâs how Falco was born. The first (rather simple) version was released at the end of 2016 and included most of the important components, such as the rule engine. Falcoâs rule engine was inspired by Snortâs but designed to operate on a much richer and more generic dataset and was plugged into the sysdig libraries. It shipped with a relatively small but useful set of rules. This initial version of Falco was largely a single-machine tool, with no ability to be deployed in a distributed way. We released it as open source because we saw a broad community need for it and, of course, because we love open source!
Expanding into Kubernetes
As the tool evolved and the community embraced it, Falcoâs developers expanded it into new domains of applicability. For example, in 2018 we added Kubernetes audit logs as a data source. This feature lets Falco tap into the stream of events produced by the audit log and detect misconfigurations and threats as they happen.
Creating this feature required us to improve the engine, which made Falco more flexible and better suited to a broader range of use cases.
Joining the Cloud Native Computing Foundation
In 2018 Sysdig contributed Falco to the Cloud Native Computing Foundation (CNCF) as a sandbox project. The CNCF is the home of many important projects at the foundation of modern cloud computing, such as Kubernetes, Prometheus, Envoy, and OPA. For our team, making Falco part of the CNCF was a way to evolve it into a truly community-driven effort, to make sure it would be flawlessly integrated with the rest of the cloud native stack, and to guarantee long-term support for it. In 2021 this effort was expanded by the contribution of the sysdig kernel module, eBPF probe, and libraries to the CNCF, as a subproject in the Falco organization. The full Falco stack is now in the hands of a neutral and caring community.
Plugins and the cloud
As years passed and Falco matured, a couple of things became clear. First, its sophisticated engine, efficient nature, and ease of deployment make it suitable for much more than system callâbased runtime security. Second, as software becomes more and more distributed and complex, runtime security is paramount to immediately detecting threats, both expected and unexpected. Finally, we believe that the world needs a consistent, standardized way to approach runtime security. In particular, there is great demand for a solution that can protect workloads (processes, containers, services, applications) and infrastructure (hosts, networks, cloud services) in a converged way.
As a consequence, the next step in the evolution of Falco was adding modularity, flexibility, and support for many more data sources spanning different domains. For example, in 2021 a new plugin infrastructure was added that allows Falco to tap into data sources like cloud provider logs to detect misconfigurations, unauthorized access, data theft, and much more.
A long journey
Falcoâs story stretches across more than two decades and links many people, inventions, and projects that at first glance donât appear related. In our opinion, this story exemplifies why open source is so cool: becoming a contributor lets you learn from the smart people who came before you, build on top of their innovations, and connect communities in creative ways.
Get Practical Cloud Native Security with Falco now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.