Chapter 1. The Role of ML and AI in Security

Why has there been such a sudden explosion of ML and AI in security? The truth is that these technologies have been underpinning many security tools for years. Frankly, both tools are necessary precisely because there has been such a rapid increase in the number and complexity of attacks. These attacks carry a high cost for business. Recent studies predict that global annual cybercrime costs will grow from $3 trillion in 2015 to $6 trillion annually by 2021. This includes damage and destruction of data, stolen money, lost productivity, theft of intellectual property, theft of personal and financial data, embezzlement, fraud, post-attack disruption to the normal course of business, forensic investigation, restoration and deletion of hacked data and systems, and reputational harm.1 Global spending on cybersecurity products and services for defending against cybercrime is projected to exceed $1 trillion cumulatively from 2017 to 2021.2

The reality is that organizations have not been able to rely for a while on a “set it and forget it” approach to security using antiquated, inflexible, and static defenses. Instead, adaptive and automated security tools that rely on ML and AI under the hood are becoming the norm in security, and your security team must adapt to these technologies in order to be able to succeed.

Security teams are tasked with protecting an organization’s data, operations, and people. To protect against the current attack posture of their adversaries, these teams will need increasingly advanced tools.

As the sophistication level of malicious bots and other attacks increases, traditional approaches to security, like antivirus software or basic malware detection, become less effective. In this chapter, we examine what is not working now and what will still be insufficient in the future, while laying the groundwork for the increased use of ML- and AI-based security tools and solutions.

Where Rules-Based, Signature-Based, and Firewall Solutions Fall Short

To illustrate why rules-based and signature-based security solutions are not strong enough to manage today’s attackers, consider antivirus software, which has become a staple of organizations over the past 30 years. Traditional antivirus software is rules-based, triggered to block access when recognized signature patterns are encountered. For example, if a known remote access Trojan (RAT) infects a system, the antivirus installed on the system recognizes the RAT based on a signature (generally a file hash) and stops the file from executing.

What the antivirus solution does not do is close off the infection point, whether that is a vulnerability in the browser, a phishing email, or some other attack vector. Unfortunately, this leaves the attacker free to strike again with a new variation of the RAT for which the victim’s antivirus solution does not currently have a signature. Antivirus software also does not account for legitimate programs being used in malicious ways. To avoid being detected by traditional antivirus software, many malware authors have switched to so-called file-less malware. This malware relies on tools already installed on the victims’ systems such as a web browser, PowerShell, or another scripting engine to carry out their malicious commands. Because these are well-known “good” programs, the antivirus solutions allow them to operate, even though they are engaging in malicious activity.

This is why many antivirus developers have switched detection to more heuristic methods. Rather than search just for matching file hashes, they instead monitor for behaviors that are indicative of malicious code. The antivirus programs look for code that writes to certain registry keys on Microsoft Windows systems or requests certain permissions on macOS devices and stops that activity, irrespective of whether the antivirus has a signature for the malicious files.

Firewalls work in a similar way. For example, if an attacker tries to telnet to almost any host on the internet, the request will most likely be blocked. This is because most security admins disable inbound telnet at the firewall. Even when the telnet daemon is running on internal systems, it is generally blocked at the firewall, meaning external attackers cannot access an internal system using telnet. Of course, attackers can use telnet to access systems that are outside of the firewall, such as routers, assuming the telnet daemon is running on those systems. This is why it is important to disable the telnet daemon directly on the devices, in addition to blocking the protocol at the firewall.

Generally, firewalls are inadequate to defeat today’s attacks. Firewalls either block or allow traffic with no regard for the content of the traffic. This is why attackers have moved to exfiltrate stolen data using ports 80 and 443 (HTTP and HTTPS, respectively). Almost every organization has to allow traffic outbound on these ports, otherwise people in that organization cannot do their jobs. The attackers know this, and they’ll normally open their backdoors and establish command and control communications with their victims using ports 80 and 443. As a result, data can be stolen out of the network through the firewall.

This is also the reason why phishing attacks are so rampant today. Attackers in most cases can’t get in through the firewalls from the outside-in to attack an internal computer; therefore, they phish people and get them to do the work for them. The victims click, they are directed to a malicious site, and the return “malicious” traffic is allowed through the firewall. It’s just the way firewalls work. Most often the return traffic is an exploit for a known vulnerability and some additional code that will be executed by the victim, opening up a backdoor on the system.

In comparison, when firewalls are deployed in front of websites and applications, organizations must leave ports 80 and 443 wide open to the internet. These ports must be opened “inbound” so that users on the internet can access the services running on the downstream servers and applications. Because these ports must be left open to support web services, inbound attacks and malware exploits, among other threats, pass through the firewall undetected. In this case, firewalls provide little, if any protection inbound.

When it comes to malicious bots and other more sophisticated threats targeting web applications, traditional approaches such as using firewalls do not work, because the attackers know how to get around them. Today’s advanced malicious actors can find an access path that can easily defeat rule- and signature-based security platforms. Attackers understand how traditional security technologies work and use this knowledge to their advantage.

Preparing for Unexpected Attacks

Every website, router, or server is, in one way or another, potentially vulnerable to attacks. Although there is a lot of hype around zero-day attacks (those attacks that were previously unknown or unpublished) most attackers take advantage of published vulnerabilities. Attackers can react quickly to newly reported vulnerabilities, often writing exploit code within hours of a new vulnerability being announced. Most often, attackers learn of vulnerabilities from the NVD website (NVD), vendor notifications and a patch availability announcement, or they discover vulnerabilities on their own.

It then becomes a race between the attackers launching active exploits against a known vulnerability and an organization being able to patch that vulnerability. Unfortunately, it is usually easier to write an exploit than it is to quickly patch newly discovered vulnerabilities. Organizations must go through myriad tests and patch deployment approvals prior to installing the patch. This is what led to the well-known Equifax breach. The vulnerability that affected Equifax was already known; a patch was available, but the patch was not deployed.

With attacks like this, signature-based security solutions work only when they have a signature for a certain exploit looking to take advantage of a known vulnerability. If a signature is not specifically created for an exploit, a signature-based security solution cannot “develop one on its own.” Human intervention is needed. In addition, every security technology vendor will race against time to develop a signature and apply it as a rule to its technology to catch and stop a known exploit. As a result, attackers tweak their exploits and create slightly different variants designed to defeat signature-based approaches. This is one of the reasons why there are massive numbers of malware variants today.

Software vendors often win the race against attackers by announcing to their customers that a vulnerability has been found and then quickly making a patch available. In some cases, it can take longer than others depending on the critical nature of the vulnerability or the amount of time it takes to develop a patch. And, in the case of the Equifax breach, human error intervened when someone simply forgot to apply the needed patch that would have likely stopped the breach.

In contrast to the more traditional “after-the-fact” approaches to security that we just discussed, ML and AI provide a nonlinear way to identify attacks, looking beyond simple signatures, identifying similarities to what has happened before, and flagging things that appear to be anomalies. The following chapter discusses ML and AI defenses in more detail.

In subsequent chapters, this report introduces the sometimes-confusing concepts of ML and AI, provides an overview of the threat that is posed by automated bots, and discusses ways that security teams can use ML and AI to better protect their organization from malicious bots and other threats.

1 Cybersecurity Ventures Annual Crime Report

2 Cybersecurity Market Report; published quarterly by Cybersecurity Ventures; 2018

Get Security with AI and Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.