Applied Machine Learning in Security Part 1: The Introduction for Skeptics

Why Machine Learning (in Security)?

We often think of the day-to-day for human analysts in computer security as being like finding a needle in a haystack or connecting the dots. But how do we find the dots in the first place? And how are they placed in such a nice arrangement? A real-word SOC (Security Operations Center) looks much more like this.

Human analysts are tasked with identifying the dots, forming the map, and connecting the dots, usually all in their own head, in real-time, and in the midst of a crisis.

We are told that machine learning is here to help. After all, if Machine Learning methods can drive a car, surely they can help in the SOC.

“Look Ma, no steering wheel!”

But what’s going on behind the scenes is rather complicated!

An overlay of detected objects along with classifications and modeled future movement. Automated driving systems perform a complex set of predictive analytics with continual feedback.

And let’s face it, the promises of machine learning and data analytics in general in security have been less than “stellar”. Before we discuss why this is, and how we might be able to do it right, we need to make sure we have a common basic understanding of machine learning.

Let’s All Get on the Same Page

All analytics and algorithms are composed of 1) a model or transformation and 2) an evaluation method.

In security, models have traditionally taken the form of simple pattern matching queries, such as those specified in snort rules or regex. More recently, analysts have tried “data science” approaches like neural networks, time series models, and similar.

We reviewed in an earlier blog post how models are evaluated.  The key takeaway is that there is a tradeoff between the number of false positives (FP) and false negatives (FN). Better models will allow for a more advantageous trade off, but this never goes away, no matter how good the model.

We can see in the following graph both a model and the evaluation function. Let’s say we have the set of points shown, and we think the data should fit a line. This line is our model. This may or may not be right, but that’s our idea. We evaluate that based on the distance of each data point to a candidate line. That’s it, the line with the overall lowest distance to all the points will be our selected, or trained, model.

Contextual Detection

How does all this apply to doing proper machine learning for security? We’ll be giving a full talk at an upcoming security conference, and posting a set of follow-up articles here that will explain how the majority of security products to date have yielded an ever-growing number of targeted detections, which by definition can individually only be partially accurate. The data rates in security are so high that even highly accurate detectors would throw a large absolute number of false positives. So this is where we are.

Fortunately, there are mathematically valid methods to construct and combine detectors to significantly reduce false positives without sacrificing detection rates. One method is to condition one detection on another detection. This effectively reduces the sample space as more and more conditionals are added on. The final FP / FN performance will still be regulated by the give-and-take relationship that governs all classifiers, but the data will be biased in the direction of more relevant data.

However, to build a system based on these sorts of conditional detections, data must be organized prior to detection; otherwise, detections will continue to be isolated and human analysts will continue to have to (try to) manually correlate them. It is past time to leverage computation for the grunt work of organizing data so that human analysts can be freed to do the creative work in security.

Our follow-on posts will get into much more detail, included specific examples for C2 detections, and we will add the related speaking schedule here once it’s announced. Hope to see you there!


Share on