Chapter 1. Introduction to Cybersecurity Science
This chapter will introduce the concept—and importance—of cybersecurity science, the scientific method, the relationship of cybersecurity theory and practice, and high-level topics that relate to science, including human factors and metrics.
Whether you’re a student, software developer, forensic investigator, network administrator, or have any other role in providing cybersecurity, this book will teach you the relevant scientific principles and flexible methodologies for effective cybersecurity. Essential Cybersecurity Science focuses on real-world applications of science to your role in providing cybersecurity. You’ll learn how to conduct your own experiments that can evaluate assurances of security.
Let me offer a few reasons why science is worth the trouble.
Science is respected. A majority of the population sees value in scientific inquiry and scientific results. Advertisers appeal to it all the time, even if the science is nonsensical or made up. People will respect you and your work in cybersecurity if you demonstrate good science. “In the past few years, there has been significant interest in promoting the idea of applying scientific principles to information security,” said one report.1 Scientific research can help convince your audience about the value of a result.
Science is sexy. In addition to respect, many nonscientists desire to understand and be part of a field they admire. Once perceived as dry, boring, and geeky, science is becoming a thing of admiration, and more and more people want to be identified with it.
Science provokes curiosity. Information security (infosec) professionals are curious. They ask good questions and crave information, as evidenced by the increasing value being placed on data science. Science is a vehicle for information, and answers stimulate more questions. Scientific inquiry brings a deeper understanding about the cybersecurity domain.
Science creates and improves products. In the commercial space, the market drives cybersecurity. Scientific knowledge can improve existing products and lead to groundbreaking innovation and applications. For infosec decision-makers, the scientific method can make product evaluations defensible and efficient.
Science advances knowledge. Science is one of the primary ways that humans unearth new knowledge about the world. Participants in science have the opportunity to contribute to the body of human understanding and advance the state of the art. In cybersecurity in particular, science will help prove practices and techniques that work, moving us away from today’s practice of cybersecurity “folk wisdom.”
Scientific experimentation and inquiry reveal opportunities to optimize and create more secure cyber solutions. For instance, mathematics alone can help cryptographers determine how to design more secure crypto algorithms, but mathematics does not govern the process of how to design a useful network mapping visualization. Visualization requires experimentation and repeatable user studies. Validation in this context is more like justification for design choices. What is the optimal sampling rate for NetFlow in my situation? Trying to answer that question and maximize the validity of the answer is a scientific endeavor. Furthermore, you can learn and apply lessons from what others have done in the past.
What Is Cybersecurity Science?
Cybersecurity science is an important aspect of the understanding, development, and practice of cybersecurity. Cybersecurity is a broad category, covering the technology and practices used to protect computer networks, computers, and data from harm. People throughout industry, academia, and government all use formal and informal science to create and expand cybersecurity knowledge. As a discipline, the field of cybersecurity requires authentic knowledge to explore and reason about the “how and why” we build or deploy security controls.
When I talk about applying science and the scientific method to cybersecurity, I mean leveraging the body of knowledge about cybersecurity (science) and a particular set of techniques for testing a hypothesis against empirical reality (the scientific method).
Unfortunately, science has a reputation for being stuffy and cold, and something that only people in white lab coats are excited about. As a cybersecurity practitioner, think of science as a way to explore your curiosity, an opportunity to discover something unexpected, and a tool to improve your work.
You benefit every day from the experimentation and scientific investigation done by people in cybersecurity. To cite a few examples:
Microsoft Research provides key security advances for Microsoft products and services, including algorithms to detect tens of millions of malicious Hotmail accounts.
Government and private researchers created Security-enhanced Linux.
Research at Google helps improve products such as Chrome browser security and YouTube video fingerprinting.
Symantec Research Labs has contributed new algorithms, performance speedups, and products for the company.
Cybersecurity is an applied science. That is, people in the field often apply known facts and scientific discoveries to create useful applications, often in the form of technology. Other forms of science include natural science (e.g., biology), formal science (e.g., statistics), and social science (e.g., economics). Cybersecurity overlaps and is influenced by connections with social sciences such as economics, sociology, and criminology.
Like applied science, cybersecurity science often takes the form of applied research—the goal of the work is to discover how to meet a specific need. For example, if you wanted to figure out how to tune your intrusion detection system, that could be an applied research project.
The Importance of Cybersecurity Science
Every day, you as developers and security practitioners deal with uncertainty, unknowns, choices, and crises that could be informed by scientific methods. You might also face very real adversaries who are hard to reason about. According to a report on the science of cybersecurity, “There is every reason to believe that the traditional domains of experimental and theoretical inquiry apply to the study of cyber-security. The highest priority should be assigned to establishing research protocols to enable reproducible experiments.”3
To get started, look at the following examples of how cybersecurity science could be applied to practical cybersecurity situations:
Your job is defending your corporate network and you have a limited budget. You’ve been convinced by a new security concept called Moving Target Defense, which says that controlling change across multiple system dimensions increases uncertainty and complexity for attackers. Game theory is a scientific technique well-suited to modeling the arms race between attackers and defenders, and quantitatively evaluating dependability and security. So you could try setting up an experiment to determine how often you’ll have to apply moving target defense if you think the attacker will try to attack you 10 times a day.
As a malware analyst, you are responsible for writing intrusion detection system (IDS) signatures to identify and block malware from entering your network. You want the signature to be accurate, but IDS performance is also important. If you knew how to model the load, you could write a program to determine the number of false negatives for a given load.
You’ve written a new program that could revolutionize desktop security. You want to convince people that it’s better than today’s antivirus. You decide to run analysis to determine whether people will buy your software, by comparing the number of compromises when using your product versus antivirus and also factoring in the cost of the two products. This is a classical statistical gotcha because you’ve introduced two incompatible variables (compromises detected and dollars).
You’ve developed a smartphone game that’s taking off in the marketplace. However, users have started complaining about the app crashing randomly. You would be wise to run an experiment with a random “monkey” that ran your app over and over, pressing buttons in different sequences to help identify which code path leads to the crash.
Cybersecurity requires defenders to think about worst-case behaviors and rare events, and that can be challenging to model realistically. Cybersecurity comprises large, complex, decentralized systems—and scientific inquiry dislikes complexity and chaos. Cybersecurity must deal with inherently multiparty environments, with many users and systems. Accordingly, it becomes difficult to pinpoint the important variable(s) in an experiment with these complex features.
Cybersecurity is complex because it is constantly changing. As soon as you think you’ve addressed a problem, the problem or the environment changes. Amazon, which has reportedly sold as many as 306 items per second, commissioned a study to determine how many different shaped and sized boxes they needed. The mostly mathematical study went on for over a year and the team produced a recommendation. The following day, Amazon launched an identical study to re-examine the exact same problem because buyers’ habits had changed and people were buying different sized and shaped goods. Cybersecurity, like shopping habits, is a constantly changing problem, as evidenced by dynamic Internet routing and the unpredictable demand on Internet servers and services.
Science isn’t just about solving problems by confirming hypotheses; science is also about falsifiability. Instead of proving a scientific hypothesis correct, the idea is to disprove a hypothesis. This scientific philosophy came in Karl Popper’s 1935 book The Logic of Scientific Discovery. Popper used falsifiability as the demarcation criterion for science but noted that science often proceeds based on claims or conjectures that cannot (easily) be verified. If something is falsifiable, that doesn’t mean that it is false. It means that if the hypothesis were false, then you could demonstrate its falsehood. For example, if a newspaper offers the hypothesis “China is the biggest cyber threat,” that claim is nonfalsifiable because you can’t prove it wrong. Perhaps it is based on undisclosed evidence. If the statement is wrong, all you will ever find is an absence of evidence. There is no way to empirically test the hypothesis.
Central motivations for the scientific method are to uncover new truths and to root out error, common goals shared with cybersecurity. Science has been revealing insights into “what if” questions for thousands of years. Businesses need new products and innovations to stay alive, and science can produce amazing and sometimes unexpected results to create and improve technology and cybersecurity. Science can also provide validation for the work you do by showing—even proving—that your ideas and solutions are better than others. If you choose to present your findings in papers or at conferences, you also receive external validation from your peers and contribute to the global body of knowledge.
Think about how much science plays a part at Google, even aside from security. The 1998 paper Google published on the PageRank algorithm described a novel idea that launched a $380 billion company. Today, Google researchers publish dozens of papers on security every year and those results inform security in their products and services, from Android to Gmail. Scientific advances conducted inside and outside the company undoubtedly save and make money for Google.
Lastly, learning science consists, in part, of learning the language of science. Once you learn the language, you’ll be better equipped to understand scientific conversations and papers. You will also have the ability to more clearly communicate your results to others, and it’s more likely that other amateur and professional scientists will respect your work.
The Scientific Method
The scientific method is a structured way of investigating the world. This group of techniques can be used to gain knowledge, study the state of the world, correct errors in current knowledge, and integrate facts. Importantly for us, the scientific method contributes to a theoretical and practical understanding of cybersecurity.
Our modern understanding of the scientific method stems from Francis Bacon’s Novum Organum (1620) and the work of Descartes, though others have refined the process since then. The Oxford English Dictionary defines the scientific method as “a method of observation or procedure based on scientific ideas or methods; specifically an empirical method that has underlain the development of natural science since the 17th century.” An empirical method is one in which the steps are based on observation, investigation, or experimentation.
At its heart, the scientific method contains only five essential elements:
Formulating a question from previous observations, measurements, or experiments
Induction and formulation of hypotheses
Making predictions from the hypotheses
Experimental testing of the predictions
Analysis and modification of the hypotheses
These steps are said to be systematic. That is to say, they are conducted according to a plan or organized method. If you jump around the steps in an unplanned way, you will have violated the scientific method. In Chapter 2 we will discuss how to do each of these five steps.
There are also five governing principles of the scientific method. These principles are:
Objective. A fair, objective experiment is free from bias and considers all the data (or a representative sample), not just data that validates your hypothesis.
Falsifiable. It must be possible to show that your hypothesis is false.
Reproducible. It must be possible for you or others to reproduce your results.4
Predictable. The results from the scientific method can be used to predict future outcomes in other situations.
Verifiable. Nothing is accepted until verified through adequate observations or experiments.
It’s interesting that the scientific method isn’t on the computer science curriculum in graduate school or computer security professional certifications. Many students and professionals haven’t considered the scientific method since grade school and no longer remember how to apply it to their profession. However, the problem may be systemic. Take performance, for example. Say you have a malware detection tool and want to analyze 1,000 files. A theoretical computer scientist might look at your malware detection algorithm and say, “the asymptotic bounds of this algorithm are O(n2) time,” meaning it belongs to a group of algorithms whose performance corresponds to the square of the size of the input. Informative, huh? It might be, but it masks implementation details that actually matter to the amount of wall clock time the algorithm takes in practice.
There are many research designs to choose from in the scientific method. The one you pick will be primarily based on the information you want to collect, but also on other factors such as cost. This book mainly focuses on experimentation, but other research methods are shown in Table 1-1.
Research method | Aim of the study |
---|---|
Case study | Observe and describe |
Survey | Observe and describe |
Natural environment observation | Observe and describe |
Longitudinal study | Predict |
Observation study | Predict |
Field experiment | Determine causes |
Double-blind experiment | Determine causes |
Literature review | Explain |
The way you approach cybersecurity science depends on you and your situation. What if you don’t have the time or resources to do precise scientific experiments? Is that OK? It probably depends on the circumstances. If you build software that is used in hospitals or nuclear command and control, I hope that science is an important part of the process. Scientists often talk about scientific rigor. Rigor is related to thoroughness, carefulness, and accuracy. Rigor is a commitment to the scientific method, especially in paying attention to detail and being unbiased in the work.
Cybersecurity Theory and Practice
“In theory, there is no difference between theory and practice. In practice, there is.”5 So goes a quote once overheard at a computer science conference. The contention of theory versus practice long predates cybersecurity. The argument goes that practitioners don’t understand fundamentals, leading to suboptimal practices, and theorists are out of touch with real-world practice.
Research and science often emerge following practical developments. “The steam engine is a perfect example,” writes Dr. Henry Petroski. “It existed well before there was a science of thermodynamics to explain what was happening from a theoretical point of view. The Wright Brothers designed a plane before there was a theory of aerodynamics.” Cybersecurity may follow a similar trajectory, with empiricists running a bit ahead of theorists.
The application of theory into practice has direct impact on our lives. Consider approaches to protecting a system from denial-of-service attacks. In theory, it is impossible to distinguish between legitimate network traffic and malicious traffic because malicious traffic can imitate legitimate traffic so effectively. In practice, an administrator may find a pattern or fingerprint in attack traffic allowing her to block only the malicious traffic.
One reason for the disconnect between theory and practice in cybersecurity is that there are few axioms in security. Despite decades of work in cybersecurity, the community has failed to uncover the building blocks that you might expect from a mature field. In 2011, the US government published “Trustworthy Cyberspace: Strategic Plan for the Federal Cybersecurity Research and Development Program”. As a result of this strategy, the government created the Science of Security Virtual Organization (SoS VO) to research “first principles and the fundamental building blocks for security and trustworthiness.” The NSA now funds academic research groups called “lablets” to conduct research aimed at “establishing scientific principles upon which to base trust in security” and “to bring scientific rigor to research in the cybersecurity domain.” This work aims to improve cybersecurity theory, which will hopefully in turn translate into practical cybersecurity implementations.
Note
Axioms are assumptions which are generally accepted as truth without proof. The mathematical axiom of transitivity says if x=y and y=z then x=z.
Pseudoscience
A word of caution: science can be used for good, but it can also be deceiving if misused, misapplied, or misunderstood. Pseudoscience, on the other hand, is a claim or belief that is falsely presented or mistakenly regarded as science. Theories about the Bermuda Triangle are pseudoscience because they are heavily dependent on assumptions. Beware of misinterpretation and inflation of scientific findings. Popular culture was largely misled by the media hype over the “Mozart effect,” which stemmed from a paper showing increased test scores in students who listened to a Mozart sonata.
Michael Gordin, a Princeton historian of science, wrote in his book The Pseudoscience Wars (University of Chicago Press, 2012), “No one in the history of the world has ever self-identified as a pseudoscientist.” Pseudoscience is something that we recognize after the work has been done. You should learn to recognize the markers of pseudoscience in other people’s work and in your own.
For more cautionary notes on scientific claims, especially in marketing, see Appendix A.
Human Factors
Science is a human pursuit. Even when humans are not the object of scientific investigation, as they often are in biology or psychology, humans are the ones conducting all scientific inquiry including cybersecurity. The 2015 Verizon Data Breach Investigations Report pointed out that “the common denominator across the top four [incident] patterns—accounting for nearly 90% of all incidents—is people.” This section introduces the high-level roles for humans in cybersecurity science and the important concept of recognizing human bias in science.
Roles Humans Play in Cybersecurity Science
Humans play a role in cybersecurity science in at least four ways:
Humans as developers and designers. We will be talking a lot about cybersecurity practitioners in their roles thinking and acting as scientists.
Humans as users and consumers. Humans as users and consumers often throw a wrench into cybersecurity. Users are commonly described as the weakest link in cybersecurity.
Humans as orchestrators and practitioners. Our goal is to defend a network, data, or users, and we decide how to achieve the desired goal. Defenders must be knowledgeable of the environment, the tools at their disposal, and the state of security at a given time. Human defenders bring their own limitations to cyber defense, including their incomplete picture of the environment and their human biases.
Humans as active adversaries. Human adversaries can be unpredictable, inconsistent, and irrational. They are difficult to attribute definitively, and they masquerade and hide easily online. Worse, the best human adversaries abandon specific attacks more quickly than defenders like you can discover them. Scientific inquiry in chemistry and physics have no analogous opponent.
Note
For a very long time, scientific inquiry was a solo activity. Experiments were done by individuals, and papers were published by a single author. However, by 2015, 90% of all science publications were written by two or more authors.6 Today there is too much knowledge for one person to possess on his or her own. Collaboration and diversity of thought and skill make scientific results more interesting and more useful. I strongly encourage you to collaborate in your pursuit of science, and especially with people of different skills.
Human Cognitive Biases
Cognitive errors and human cognitive biases have the potential to greatly affect objective scientific study and results. Bias is an often misused term that when used correctly, describes irrational, systematic errors that deviate from rational decisions and cause inaccurate results. Bias is not the same as incompetence or corruption, though those also interfere with neutral scientific inquiry. Below are three biases that are especially useful to beware of as you think about science.
Confirmation bias is the human tendency toward searching for or interpreting information in a way that confirms one’s preconceptions, beliefs, or hypotheses, leading to statistical errors. This bias is often unconscious and unintentional rather than the result of deliberate deception. Remember that scientific thinking should seek and consider evidence that supports a hypothesis as well as evidence that falsifies the hypothesis. To avoid confirmation bias, try to keep an open mind and look into surprising results if they arise. Don’t be afraid to prove yourself wrong. Confirmation bias prevents us from finding unbiased scientific truths, and contributes to overconfidence.
Daniel Kahneman, author of Thinking Fast and Slow, uses the acronym WYSIATI, for “what you see is all there is,” to describe overconfidence bias. Kahneman says that “we often fail to allow for the possibility that evidence that should be critical to our judgment is missing—what we see is all there is.” Without conscious care, there is a natural tendency to deal with the limited information you have as if it were all there is to know.
Cybersecurity is shaped in many ways by our previous experiences and outcomes. For example, looking back after a cybersecurity incident, our CEO might assign a higher probability that we “should have known” compared to the choices made before the incident occurred. Hindsight bias leads people to say “I knew that would happen” even when new information distorts an original thought. Hindsight also causes us to undervalue the element of surprise of scientific findings.
As you pursue science and scientific experimentation, keep biases in mind and continually ask yourself whether or not you think a bias is affecting your scientific processes or outcomes.
The Role of Metrics
It’s easy to make a mental mistake by substituting metrics for science. Managers like metrics—the analysis of measurements over time—because they think these numbers alone allow them to determine whether the organization is secure or succeeding. Sometimes metrics really are called for. However, counting the number of security incidents at your company is not necessarily an indication of how secure or insecure the company is. Determining the percentage of weak passwords for your users is a metric but not also a scientific inquiry. As we will see in Chapter 2, hypotheses are testable proposed explanations like “people take more risks online than in their physical lives.”
Don’t get me wrong: most experiments measure something! Metrics can be part of the scientific process if they are used to test a hypothesis. The topic of security metrics may also be the foundation for scientific exploration. The point is not to be fooled by believing that metrics alone can be substituted for science. To learn more about the active field of security metrics, visit SecurityMetrics.org, which hosts an active mailing list and annual conference.
Conclusion
The key concepts and takeaways about the scientific method presented in this chapter and used throughout the book are:
Cybersecurity science is an important aspect of the understanding, development, and practice of cybersecurity.
Scientific experimentation and inquiry reveal opportunities to optimize and create more secure cyber solutions.
The scientific method contains five essential elements: ask a good question, formulate hypotheses, make predictions, experimentally test the predictions, analyze the results.
Experiments must be objective, falsifiable, reproducible, predictable, and verifiable.
The human elements of cybersecurity science are critical to designing accurate and unbiased experiments and to maximizing the practical usefulness of experiments.
References
William I. B. Beveridge. The Art of Scientific Investigation (Caldwell, NJ: Blackburn Press, 2004)
Lorraine Daston and Elizabeth Lunbeck (eds). Histories of Scientific Observation (Chicago: University of Chicago Press, 2011)
Richard Feynman. The Pleasure of Finding Things Out (2005)
Hugh G. Gauch, Jr. Scientific Method in Brief (Cambridge: Cambridge University Press, 2012)
Richard Hamming. You and Your Research (1986)
International Workshop on Foundations & Practice of Security
Roy Maxion. Making Experiments Dependable, Dependable and Historic Computing, ser. Lecture Notes in Computer Science, vol. 6875, pp. 344–357 (Heidelberg: Springer-Verlag, 2011)
1 Barriers to the Science of Security.
2 Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. 2004. “On the effectiveness of address-space randomization.” In Proceedings of the 11th ACM Conference on Computer and Communications Security (CCS ’04). ACM, New York, NY, USA, 298-307.
3 Science of Cyber Security, MITRE Report JSR-10-102, November 2010, http://fas.org/irp/agency/dod/jason/cyber.pdf.
4 Reproducibility is not the same as repeatability or replicability.
5 Pascal: An Introduction to the Art and Science of Programming by Walter J. Savitch, 1984.
6 Enhancing the Effectiveness of Team Science, Nancy J. Cooke and Margaret L. Hilton (Eds.), http://www.nap.edu/catalog/19007/enhancing-the-effectiveness-of-team-science, 2015.
Get Essential Cybersecurity Science now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.