Introduction
Open source software—the code of which is publicly available to scrutinize and typically free to use—is awesome. As consumers, it spares us the need to reinvent the wheel, letting us focus on our core functionality and dramatically boosting our productivity. As authors, it lets us share our work, gaining community love, building up a reputation, and at times having an actual impact on the way software works.
Because it’s so amazing, open source usage has skyrocketed. Practically every organization out there, from mom-and-pop shops to banks and governments, relies on open source to operate their technical stacks—and their businesses. Tools and best practices have evolved to make such consumption increasingly easier, pulling down vast amounts of functionality with a single code or terminal line.
Unfortunately, using open source also carries substantial risks. We’re relying on this crowdsourced code, written by strangers, to operate mission-critical systems. More often than not, we do so with little or no scrutiny, barely aware of what we’re using and completely blind to its pedigree.
Every library you use carries multiple potential pitfalls. Does the library have software vulnerabilities that an attacker can exploit? Does it use a viral license that puts our intellectual property at risk? Did a malicious contributor hide malware amidst the good code?
Unlike commercial software, Free Open Source Software (FOSS) rarely offers any guarantees or warranties. As a consumer of open source software, it is your responsibility to understand and mitigate these risks.
This risk materialized in full force with the Equifax data breach announced in September 2017. The hack, which exposed extremely personal information of 143 million individuals, was possible due to a severe vulnerability in the open source Apache Struts library. This vulnerability was disclosed in March 2017, the Equifax system in question was not patched until late July, only after the breach was discovered. Equifax was fully capable of identifying and fixing this issue earlier, preventing the massive leak, and many claim not doing so is negligence on the company’s part. The Equifax breach is certain to become a poster child for the importance of securing data and using open source responsibly.
Book Purpose and Target Audience
This book will help you address the risk of vulnerable open source libraries, the very thing that tripped up Equifax. As I’ll discuss throughout this book, such vulnerable dependencies are the most likely to be exploited by attackers, and you’ll need good practices and tools to protect your applications at scale.
Because the responsibility for securing applications and their libraries is shared between development (including DevOps) and application security, this book is aimed at architects and practitioners in both of these departments.
With that in mind, the next few sections further explain what is in and out of scope for this book. The remaining topics will hopefully be covered in a broader future book.
Tools Versus Libraries
Open source projects come in many shapes and forms. One somewhat oversimplified way to categorize them is to divide them into tools and libraries.
Tools are standalone entities, which can be used or run without writing an application of your own. Tools can be big or small, ranging from tiny Linux utilities, such as cat and cURL, to full and complex platforms such as CloudFoundry or Hadoop.
Libraries hold functionality meant to be consumed inside an application. Examples include Node.js’s Express web server, Java’s OkHttp HTTP client, or the native OpenSSL TLS library. Like projects, libraries vary greatly in size, complexity, and breadth of use.
This book focuses exclusively on libraries. While some open source projects can be consumed as both a tool and a library, this book only considers the library aspect.
Application Versus Operating System Dependencies
Open source software (OSS) projects can be downloaded directly from their website or GitHub repository, but are primarily consumed through registries, which hold packaged and versioned project snapshots.
One class of registries holds operating system dependencies. For instance, Debian and Ubuntu systems use the apt registry to download utilities, Fedora and RedHat users leverage yum, and many Mac users use HomeBrew to install tools on their machines. These are often referred to as server dependencies, and updating them is typically called “patching your servers”.
Another type of registry holds software libraries primarily meant to be consumed by applications. These registries are largely language specific—for example, pip holds Python libraries, npm holds Node.js and frontend JavaScript code, and Maven serves the Java and adjacent communities.
Securing server dependencies primarily boils down to updating your dependencies by running commands such as apt-get upgrade
frequently. While real-world problems are never quite this simple, securing server dependencies is far better understood than securing application dependencies. Therefore, while much of its logic applies to libraries of all types, this book focuses exclusively on application dependencies.
To learn more about securing your servers, including their dependencies, check out Lee Brotherston and Amanda Berlin’s Defensive Security Handbook (O’Reilly, 2017).
Known Vulnerabilities Versus Other Risks
There are multiple types of risks associated with consuming open source libraries, ranging from legal concerns with library license, through reliability concern in stale or poorly managed projects, to libraries with malicious or compromised contributors.
However, in my opinion, the most immediate security risk lies in known vulnerabilities in open source libraries. As I’ll explain in the next chapter, these known vulnerabilities are the easiest path for attackers to exploit, and are poorly understood and handled by most organizations.
This book focuses on continuously finding, fixing, and preventing known vulnerabilities in open source libraries. Its aim is to help you understand this risk and the steps you need to take to combat it.
Comparing Tools
Tools that help address vulnerable libraries are often referred to as Software Composition Analysis (SCA) tools. This acronym doesn’t represent the entire spectrum of risk (notably, it does not capture the remediation that follows the analysis), but as it’s the term used by analysts, I will use it throughout this book.
Because the tooling landscape is evolving rapidly, I will mostly avoid referencing the capabilities of specific tools, except when the tool is tightly tied to the capability in question. When naming tools, I’ll focus on ones that are either free or freemium, allowing you to vet them before use. Chapter 6 takes a higher level perspective to evaluating tools, offering a more opinionated view of which aspects matter most when choosing the solution.
Book Outline
Now that you understand the subject matter of this book, let’s quickly review the flow:
-
Chapter 1 defines and discusses known vulnerabilities and why it’s important to keep abreast of them.
-
Chapters 2 through 5 explain the four logical steps in addressing known vulnerabilities in open source libraries: finding vulnerabilities, fixing them, preventing the addition of new vulnerable libraries, and responding to newly disclosed vulnerabilities.
-
Chapter 6, as mentioned earlier, advances from explaining the differences between SCA tools to highlighting what I believe to be the most important attributes to focus on.
-
Finally, Chapter 7 summarizes what we’ve learned, and briefly touches on topics that were not covered at length.
This book assumes that you are already familiar with the basics of using open source registries such as npm, Maven, or RubyGems. If you’re not, it’s worth reading up on one or two such ecosystems before starting on this book, to make the most of it.
Get Securing Open Source Libraries now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.