Bayesian Networks: One Solution for Specific Challenges in Building ML Systems in Cybersecurity
Rob Mealey, Director of Data Science, Resilience Insurance
There are two concepts that help in understanding the specific data challenges faced—and constraints on—machine learning systems developed to forecast and quantify cybersecurity risk. These are applicable to systems that can be classified as tactical or defensive in some way—such as an intrusion detection prevention solution—as well as to those, such as in the domain of cybersecurity insurance, that can be thought of as more strategic, more concerned with cybersecurity risk in the context of the larger picture. These are the concepts of:
- Data scarcity, where the challenge for modeling some system is that for any number of reasons, there do not exist easily gathered, reliable datasets from which to model the relationships within the system and forecast future behavior of it;
- Interpretability, where the constraint is that the model's conclusions and forecasts must be interpretable and explainable to a human subject‐matter expert, whether for business or regulatory or any number of other reasons.
Data scarcity issues can take many forms. In some scientific and research domains, such as research around deep‐sea marine biology or other environmental research, the issue truly is a lack of data of any kind. Whether the cost of data collection is prohibitive or the events being studied are sufficiently rare or difficult to observe, ...
Get How to Measure Anything in Cybersecurity Risk, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.