Chapter 1. Incident Response Fundamentals

“In tranquillo esse quisque gubernator potest.”

(Anyone can hold the helm when the sea is calm.)

Publilius Syrus

This book is about building a playbook or a concrete set of strategies so your InfoSec team or Computer Security Incident Response Team (CSIRT) can be efficient and effective. However, before you can develop a playbook, you need a team to run it and the policy backing to enforce it. If you are reading this book, chances are you are in some way involved with InfoSec and are looking to implement or improve a solid incident response plan, or are at least interested in how to develop effective security data mining queries. But before diving in, it’s important to cover a few essential foundations of security monitoring and incident response. You cannot create a useful playbook without understanding the basics. Crafting it well assumes an understanding of a few principles and some prerequisite capabilities.

In this first chapter, we’ll cover the following concepts:

Characteristics of a successful CSIRT: There are a few key components needed for an efficient team.
Relationship building: Successful teams develop and maintain active relationships with internal teams and external organizations, and communicate constantly and effectively within the CSIRT.
Sharing: Mature CSIRTs are responsible for engaging both customers and industry partners to share best practices, and collaborate on and research threat intelligence.
Trusting your tools: CSIRTs are responsible for managing detection and prevention tools that provide either actionable security events or detailed, historical information, as well as a myriad of other tools and services to support investigation and analysis efforts.
Policy backbone: A clear, supported policy allows you to create the CSIRT charter (authority), scope, mission, responsibilities, and obligations.

If you already have a mature CSIRT, these steps may seem obvious. However, it’s worth recognizing that relationships, tools, policies, and techniques change over time. Predicting future security threats and attack techniques is very difficult, and about as accurate as predicting the weather. Researching, staying updated, and being agile with your monitoring and response capabilities will deliver success and improve your efficacy.

The Incident Response Team

Incident response teams clean up after intruders break in, deface, steal, disrupt, knock over, or just mess with hosts or networks. They will find out where the perpetrators came from, how they did it, and most of the time know exactly why. They will try to know when it happened and for how long. They’ll know what was affected, and will figure out how to stop it from happening again. They will also share all this information with their friends. Intruders only need to be right just once, no matter how many times they try; incident response teams need to be right every time.

Without a dedicated team of security professionals to protect an organization, it will be impacted, and in some cases, devastated by security threats and intrusions. Inevitably, investing in InfoSec and incident response will justify itself once a serious incident is revealed. Think of the response team as a type of insurance policy. Investing in an incident response team up front pays off in faster restoration of business and ongoing preventative measures when there are no active incidents. Although an incident response team is unlikely to provide a revenue stream, they minimize the ultimate impact and cost to an organization during and after a critical incident. Organizations concerned with protecting their data and IT systems need to have a team and a plan ready to respond. Computer Security Incident Response Teams (with any acronym: CSIRTs, CIRTs, CERTs, SIRTs, or IRTs) get business back online quickly after a security incident, and then investigate and document exactly what happened. They provide a detailed understanding of how the incident occurred, why it occurred, who was responsible, how it can be prevented from happening again, and hopefully refine future detection, mitigation, and investigation techniques.

In the event that an incident goes public, an organization’s leaders and public-facing figures must be well informed and confident about every detail. When handling the incident and taking responsibility, trust is on the line, affecting reputation and bringing potentially unwanted attention to your organization’s InfoSec problems. The public element of an incident aside, there are other fires to put out, in some cases including financial loss or significant downtime. A good incident response team and proper InfoSec controls can help determine how it happened and how to fix it—and can communicate the details to anyone that needs them.

Justify Your Existence

Incident response teams have often been compared to the network security equivalent of firefighters. Actual firefighters are responsible for putting out fires, saving lives, enforcing fire safety codes, and promoting fire safety awareness. When they’re not actively rescuing people from burning buildings and putting out fires, firefighters practice their skills, maintain equipment, and audit buildings and structures for fire safety based on established codes. All of this makes future fires less intense and easier to control and extinguish quickly.

Similar to the firefighters’ approach, CSIRTs respond to security incidents, salvage and protect data, enforce security policy, and evangelize security best practices and development. When they are not responding to incidents, they are designing better detection techniques and strategies, while researching and preparing for the latest threats. CSIRTs are an integral part of an overall InfoSec strategy. A functional InfoSec strategy specifically includes multifaceted teams tasked with risk assessment and analysis, policy development, and security operations and controls. Response teams fill the gap of responding to computer emergencies and intrusions, while working to prevent future attacks. Cleaning up after a major incident requires real work and a response team that’s capable of:

Managing and triaging a large problem
Understanding computer systems, networks, web applications, and databases
Knowing how and when to execute mitigation techniques
Engaging relevant stakeholders
Developing short-term fixes as necessary
Working with business, host, and application owners on long-term fixes
Participating in incident postmortems and after-action reviews
Determining the root cause of an issue and how to prevent a reoccurrence
Creating detailed incident write-ups and presenting to broad audiences

When not actively investigating an incident, CSIRTs work to improve and document their detection and response techniques. Teams improve by developing additional prevention methods and maintaining an updated playbook. While an incident response playbook helps you discover incident details hiding in your data, an incident response handbook tells you how to handle them.

A good handbook provides a compendium of directives for handling cases, current links to documentation, contact information for various groups, and specific procedures to follow for any number of incident types. Combining a handbook and an incident tracking system goes a long way to help satisfy audit requirements, as you can deliver precise detail on how any incident should be handled, complete with supporting evidence in the incident tracking records. It’s also very helpful when bringing new team members into the fold because it provides a guide on how to handle common cases.

Along with an effective playbook, an incident response handbook, and a mandate to protect the organization, great teams will also possess:

Adequate resources, tooling, and training availabilities for the team to remain relevant and effective
Proper documentation and understanding of what must be protected, including information like host or user identity, and logical diagrams for systems and networks
Documented and reliable relationships with other groups in the organization

Good teams don’t necessarily have all these requirements and “nice-to-haves,” but will find creative ways to protect their organization with the resources available to them. At minimum, a team needs ample log data and methods to analyze it, and accepted techniques for shutting down attackers.

Measure Up

Measuring performance of an incident response team depends on many subjective factors. Because of a deep understanding of security and hopefully broad experience, CSIRTs fill numerous small niches in an organization, and generally work to positively influence overall security posture. There are several ways to measure a team’s detection efficacy with a few simple metrics such as the following:

How long it takes to detect an incident after it initially occurred (which should be revealed in its investigation)
How long it takes to contain an incident once it has been detected
How long it takes to analyze an alert or solve an incident
How well playbook reports are performing
How many infections are blocked or avoided

Keeping track of incident cases, application or host vulnerabilities, and a historical record of incidents helps tremendously and is invaluable for proper long-term incident response. Data garnered from these tools can help to calculate the metrics just listed and measure incident response teams over time.

Who’s Got My Back?

An incident response team cannot exist in a vacuum. Just like a firefighter doesn’t rebuild a burned-out house or calculate the possible insurance payout, an incident response team can’t bring a return to normalcy without depending on their preexisting relationships with other groups. Cultivating these relationships early, and keeping them strong, will ensure the CSIRT’s confidence in pragmatically responding to any InfoSec situation. Because there are so many other tasks to own during a serious incident, it’s not practical for one team to do it all. The response team needs to have active engagements with many groups, including:

IT, networking services, hosting/application, and database teams

Having a solid relationship with IT is the most significant factor of a CSIRT’s success. To respond properly, you need to understand the network and its architecture, as well as how complex IT systems perform their operations and the inner workings of custom software. It’s therefore imperative to partner with IT teams that work on these systems daily and have a more detailed understanding of their operating environments. IT teams such as network operations, DNS management, directory administrators, and others should be able to provide logical diagrams, details about logging, any known issues or potential vulnerabilities, attribution, and reasonable answers when it comes to questions about potentially malicious behavior on their systems. Earning the trust of the IT teams enables a better response to incidents and mitigation, and also encourages good support of your security monitoring infrastructure and its impact on IT operations.

Other InfoSec teams and management

CSIRTs can’t possibly own all the aspects of InfoSec for an organization, and because the fallout from many incidents provides fodder to drive architectural changes, it’s important to stay close with other teams that have a stake in overall security. Maintain active relationships with risk and vulnerability assessment teams, security architects, security operations teams (e.g., access control list [ACL] or firewall changers/approvers, authentication masters, public key infrastructure (PKI) groups, etc.), and security-focused executives and leadership. These will be the teams responsible for driving the long-term fixes.

Handing off the responsibility for long-term fixes from security incidents should be inevitable, and you need the expertise of architecture teams to address the current failings that may have precipitated the incident and to help develop future protections from harm so that you can continue to focus on fighting fires. For quick remediation, having the operations teams on standby will make it easier to insert an ACL, firewall rule, or other blocking technique as necessary. Having regular contact with security-focused executives instills their trust in the response team’s capabilities, as well as providing a direct channel for communicating situational awareness, impact, and progress upward.

Internal technical support services

In the event of an incident (say, for example, a mass worm outbreak), internal technical support staff needs to be updated on the current situation and armed with the proper information to respond to calls for support. If there are internal applications down as a result of an incident, technical support should be aware of the outage and, if necessary, aware of the security implications involved. There might be incidents where mass password resets are required, in which case the incident response team must rely on technical support services to properly handle the volume of potential questions and support requests. Externally facing technical support teams are often the biggest public facing part of an organization, and if there’s a published security incident, it’s certainly possible that the technical support services will be called for details, and will have to respond according to the relevant local disclosure laws. Ensuring that support teams understand what’s appropriate to share about an incident is a major component in the incident containment process and could have legal ramifications without the guidance of the organization’s legal group. This will help prevent unnecessary or possibly damaging information from being disclosed about an incident. Technical support services should have documented and tested procedures for engaging with internal incident response, legal, and InfoSec teams in the event that a major incident occurs that impacts a broad group of their customers. In some cases, the incident response team might advise of alternative remediation procedures. An example might be if a developing play indicated a new infection that required additional forensics. The incident response team may not want a system reinstalled or tampered with until it can be investigated.

Human resources (HR) and employee relations

In most cases, CSIRTs don’t just focus on external threats—they handle internal threats as well. They are often the go-to group for internal investigations, if only because they generally have logs useful for troubleshooting and investigation. When it comes to insider threats like disgruntled employees, sabotage, abuse, or harassment, log data often comes into play as evidence. Depending on the type of incident, human resources may be involved as either the entity that initiated the investigation, or as the recipient of any employee wrongdoing uncovered as the case progresses. Many security event and log data sources can be useful to both develop a timeline of activity for an incident and to profile a user’s behavior. As HR builds a case, they may request log evidence to confirm or deny a user’s behavior. CSIRT teams are well suited to search for evidence supporting HR investigations, particularly when armed with rich log data sources, such as DHCP and VPN logs that can show an employee connecting to the network, web proxy logs showing where they browsed, or NetFlow logs showing any outbound connections.

Mature CSIRTs will consider possible insider threats and ways to detect them, involving HR as appropriate. CSIRTs support investigation efforts involved in incidents like disgruntled system administrators backdooring critical devices, a departing software developer downloading many times the normal volume of company source code, or fraudulent accounting activities and embezzling. Improperly handling an employee investigation can result in lawsuits or affect the livelihood of an individual. Therefore, monitoring for and taking action on malign employee behavior should not be done without proper authorization and oversight from HR and in some cases the legal departments. Notifying HR or employee relations about an incident allows them to take action based on company policy or legal regulations.

Public relations (PR) and corporate communications

It’s happened. Your customers’ personally identifiable information (PII) data was stolen. You have evidence of the hack. Perhaps you’ve mitigated the threat, perhaps you haven’t. Who’s going to break the news to customers, inquisitive reporters, or corporate executives? Like incident response, the art of public relations is a unique skill unto itself. A balance must be struck between the amount and quality of information divulged through the proper channels, the commitments made by your organizations, and the subsequent impact of any information disclosure upon your organization. A good PR relationship will allow you to directly provide an update on the details, scope, and impact of an incident. PR, in turn, can create the necessary language and disseminate information to appropriate internal or external parties. Face it—incidents happen. The sooner you can notify PR of something that may get press coverage, reflect poorly on the company, or affect your customers, the easier to diffuse and responsibly handle the situation. Having a solid relationship with PR also means you can keep each other updated on issues that might require a joint response, or that might affect each other’s teams.

Legal departments

Rules and regulations abound describing things that can or cannot be done with data, who can view that data, how long data must be kept, how long data must not be kept, and how to properly manage and maintain data as evidence. Even more confusing, sometimes regulations differ from region to region, or customer to customer. As a CSIRT, it’s your job to understand that you may need legal approval for how you interact with data that you collect and where you collect it from. Your data retention policy needs the stamp of approval from your company’s legal counsel, in the event of a customer request for information (RFI), compliance audit, or lawsuit demanding old log data. Your legal counsel is not likely to be technical, nor understand in detail the data or systems involved. Ensure they’re aware of your use cases, as opposed to them determining for you how you may use data. This helps legal not define a policy from scratch, but rather simply determine if for any reason what you’re doing is acceptable or not. After you’ve received approval, ensure that legal’s statements are documented in an accessible and referenceable location.

Product security or development teams’ support (if applicable)

If your organization develops software (or hardware) for internal or external use, you’re susceptible to security vulnerabilities. These may be found via external notification of a product vulnerability (security researchers), by investigating an incident where a vulnerability allowed ingress for an attacker, or by penetration testers (aka pentesters) scoped to test your products. Regardless of how they’re disclosed, the vulnerabilities require patching. If you have teams that focus on product security (as opposed to infrastructure, network, and system security), they should have a direct relationship with development teams to understand what a secure fix requires and to prioritize the development, testing, and deployment of that fix. Without a dedicated product security team, you’ll need to establish a relationship with the development org to build processes for dealing with product vulnerabilities yourself.

Also consider what value, as an investigative entity, your CSIRT can provide to the organization when a vulnerability is discovered. Can you help to determine risk by scanning all affected products for susceptibility? Do you have any log evidence showing signs of compromise prior to disclosure? Can you build a playbook item to detect abuse of the vulnerability until the product security group is able to deploy a patch?

Additionally, if you lack a robust centralized logging infrastructure where developers can send their app logs, or a well-defined logging policy requiring generated events suitable for security monitoring, you may need to contact the development teams directly to acquire evidence to support an investigation. You should understand the process, be it a helpdesk ticket, bug submission, or email, to request investigative support data prior to that data actually being needed.

In organizations with no product development or product security teams, earning the trust and understanding the capabilities of other investigative groups can prove mutually valuable.

Friends on the Outside

Having solid relationships external to your organization will also go a long way toward improving the capabilities and expertise of the incident response team, not to mention the opportunities for best practice sharing and good “netizenship.” These are some of the organizations you’ll need to work with:

Internet service providers (ISPs) and other networked peers

In lieu of in-house detection and mitigation capabilities, your last resort in distributed denial of service (DDoS) defense is working with your upstream provider(s) to identify and block the source of possibly spoofed traffic. During a denial of service incident, you, your network administrators, and your ISP must work together to isolate and contain or redirect the abusive traffic.

Local and national law enforcement

Only in (hopefully) rare cases will an incident response team need to interact with law enforcement. However, there are plenty of incident types where the two paths will cross. In some cases, national law enforcement groups will request additional information about potential victims or attackers possibly engaged in activities on your organization’s network. National law enforcement agencies may also release information to help detect criminal attacks by sharing indicators of compromise.

In the event of a crime involving computer evidence related to your organization, local law enforcement may request data, systems, or statements from IT staff on any pertinent details from their investigation. Having at least a relationship with a contact in local law enforcement can be helpful for having someone to reach out to when illegal activity is discovered during a CSIRT investigation.

In some ways, law enforcement teams face similar challenges to an incident response team. Both perform forensics, person of interest investigations, and correlate data from disparate systems. Though applied differently, these commonalities provide opportunities for sharing best practices.

Product vendors and technical support

The larger your toolset, the more potential product vulnerabilities and ensuing security patches to keep up with. Vendor support can also provide an avenue to file bugs or request feature improvements. Further, on a contract basis, a vendor’s professional service group (PSGs) will integrate their vendor offerings within your environment. Keeping track of vulnerabilities in your systems as well as the overall organization ensures a readiness when major flaws are found and exposed. Subscribing to mailing lists such as Full Disclosure and others provide the CSIRT with early warning for any future incidents related to exploitation of newly released vulnerabilities.

When working with your own tools, it’s also great to have a reliable relationship with technical support. You don’t want to discover a new bug and have to wade through Sisyphean escalation chains during the middle of an incident when you really need them to work.

Industry experts and other incident response teams

Security conferences provide multiple avenues to establish and maintain beneficial relationships. Attending talks and interacting with speakers, participating in birds-of-a-feather or meet-the-engineer sessions, vendor events, or drinks at the bar all provide opportunities to connect with like-minded individuals with techniques and ideas to share. Somebody might be looking to deploy the same systems that you just deployed, or may provide a service you never knew existed, or perhaps has approached a shared security problem in an entirely different way.

CSIRTs can certainly exist without any external relationships, but their operations are only enhanced by outside perspectives. Internal relationships are absolutely critical, however, and all successful teams must cultivate them.

The Tool Maketh the Team

To create an incident response playbook to respond to security threats, you need an existing monitoring infrastructure or the intention/knowledge to build one, data retention long enough to alert or investigate, and repositories to collect, store, analyze, and present data. Assuming you have the infrastructure already, or a plan in the works, don’t forget that running a network of systems, logs, and monitors means plenty of IT work has to be done both at the outset of a deployment and ongoing maintenance, documentation, and tuning.

Even the smallest-scale enterprise system has many moving parts. The smallest and worst case is a nonredundant single machine performing all of your dependent tasks. The largest systems will have hundreds or thousands of hosts, disks, processors, applications, and a network connecting them all. In either case, to ensure availability, you need to be able to both detect if any part of the system breaks and have a process to get it fixed. This is especially important for systems on which other users depend. Got a broken inline intrusion prevention system (IPS)? Web proxy failed? If you didn’t build in redundancy or fail-open measures, you can be sure your users will let you know.

Any system administrator can rattle off the necessary components of a system that need to be monitored—ensuring your hosts are online, that the correct processes are running with the correct arguments, that you receive the intended data, have adequate disk storage, efficient disk operations, and efficient query processing. The supporting infrastructure for security monitoring—large or small—is no different than any other enterprise system. You must be able to identify when these key performance indicators are nearing or actually failing. Beyond detecting problems, you must also have a support infrastructure in place to quickly address the failure point and to do so in a reasonable amount of time. Don’t count on hackers to wait for you until you’ve replaced a failed disk.

Selecting the right tools for the job is also critically important. Ensuring that you have the capacity to collect, store, and analyze data requires an understanding of your network, devices, and potential data volume and rate. Chapter 6 goes into much deeper detail on how to make the best choices for your environment.

Choose Your Own Adventure

CSIRTs need proper tooling, relationships, and a solid technical background. However, to have any kind of authority, teams need to be recognized within an organization’s InfoSec or computing policies. Having a CSIRT internally means an expectation of network monitoring, as well as possible investigations into activity performed on an organization’s assets. Policies accepted by everyone in the organization must include language indicating the role and obligation of the CSIRT.

Company policies specifically stipulate (dis)allowed behaviors, requirements, processes, and standards. Rules are made to be broken, so policies must be enforced. These policies will serve as the basis for your charter. A solid charter will help you identify roles and responsibilities that your CSIRT will require to be successful in your own environment. For instance, if you provide a paid service to customers, what level of detection capabilities (if any) do your clients expect? Who is responsible for physical security at your organization? Are PC rebuilds mandatory to fix malware infections? Ideally, your charter should be documented, accessible, and approved by your management and senior management, as well as third-party groups such as legal or HR. It is from this charter that you will draw your enforcement powers.

Not every possible activity a CSIRT might perform necessarily has to be enshrined in policy; however, it can be beneficial to explicitly mention a few directives. Remember that all policy development should be closely aligned with an organization’s overall strategy and operations. Not every CSIRT will enforce identical policies; however, fundamentally they should be expected and explicitly permitted to:

Monitor and audit equipment, systems, and network traffic for security event monitoring, incident detection, and intrusion detection.
Execute efficient incident management procedures, including, but not limited to, disabling network access, revoking access rights and credentials, or seizure and forensic examination of electronic and computing devices.
Maintain exhaustive and exclusive control over detecting, capturing, storing, analyzing, or mitigating computer security incidents.

Again, policies are totally dependent on a business and the role a CSIRT plays, whether internal or external. Having a defined constituency can also clear up any gray areas about a CSIRT’s span of control. For example, a CSIRT might be charged with protecting corporate or organizational data, but not customer data. On the other hand, a team might be responsible for monitoring corporate networks, customer networks and data, and partner interconnections. Understanding the scope of a CSIRT’s mission helps ensure proper resourcing and expectations.

An example policy establishing a charter might look something like:

The incident response team has the authority to implement necessary actions for incident management, including but not limited to, removal of network access, revocation of access rights, or seizure and forensic examination of electronic and computing devices owned by [organization] or devices communicating on internal networks and business systems, whether owned or leased by [organization], a third party, or the employee. Data collected or analyzed during the course of an investigation will be handled according to the procedures described in the Incident Response Handbook.

In adherence to event logging, intrusion detection, incident handling, and monitoring standards, the incident response team must monitor the [organization’s] network and any networks owned by [organization], including all interconnections and points of egress and ingress.

Buy or Build?

The decision to develop an internal CSIRT or hire professional incident response services can be a difficult one. On one hand, you are absolving your organization from the overhead of hiring full-time employees onto the payroll, yet on the other hand, you are paying for a subscription service that can never replicate the contextual knowledge necessary for really good incident response. There are numerous offerings in this space, like managed security services of all types, consultants that help with one-time security incidents, or hired professional services from security companies to help you with your own response.

Even with an outsourced incident response service, it’s still just as important to establish policies that define the scope of their access and authority. Some organizations hire clean-up incident response teams post-hoc to triage and remove any remaining problems and investigate and deliver a detailed incident write-up. Other teams offer ongoing externally hosted security monitoring and response services by deploying sensors to your network and managing/monitoring them remotely. In both cases, third-party companies are working with your organization’s data and networks and should adhere to similar policies an internal incident response team might use.

Because we advocate for developing your own playbook and response capabilities, it follows that we are proponents of the homegrown CSIRT. Contextual knowledge, and a sense of ownership and domain over an organization, give an in-house incident response team the edge when it comes to overall efficacy and efficiency. Also remember that it’s difficult for computers to understand context. There is no algorithm yet possible that can factor in some aspects of a security incident.

Run the Playbook!

There are any number of ways to protect your organization, and what works for one company might not work for another. Culture, priority, risk tolerance, and investment all influence how well an organization protects itself from computer security threats.

Whatever path your organization takes, understand that to craft an effective playbook backed by human intelligence, you must understand more than how to detect computer viruses. There’s a broad variety of threats, attacks, incidents, and investigations that come up during the course of business, and it’s great to have a skilled team to own and manage them all. Having a solid team in place is the first step to executing an effective playbook. Having a solid understanding of your network, threats, and detection techniques will help you craft your own tailored incident response playbook that can repeatedly adjust to business, cultural, and environmental changes, as well as to the organization’s risk acceptance levels.

Chapter Summary

Keeping an organization safe from attack, as well as having a talented team available to respond quickly, minimizes damage to your reputation and business.
Fostering and developing relationships with IT, HR, legal, executives, and others is critical to the success of a CSIRT.
Sharing incident and threat data with external groups improves everyone’s security and gives your organization credibility and trust with groups that might be able to help in the future.
A good team relies on good tools, and a great team optimizes their operations.
A solid and well-socialized InfoSec policy gives the incident response team the authority and charter to protect networks and data.

Get Crafting the InfoSec Playbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Crafting the InfoSec Playbook by Matthew Valites, Brandon Enright, Jeff Bollinger