A recent report by Critical Start indicates that the SOC has too many alerts for analysts to process. Thirty-nine percent of analysts ignore certain categories of alerts and 38 percent turn off high volume alerting. I have seen analysts use other strategies like only looking at high severity alerts or only looking at very specific alerts of interest to cut down on the volume. This is a fundamental problem that can be solved in a couple of different ways.
First, let’s look at why this is a problem. Often an alert fires on a single occurrence of something suspicious or malicious – for example, an IDS Alert. I experienced this both as an analyst in the Air Force and at the FBI ESOC. I would start investigating the alert and quickly realize I did not have enough information to make a determination if something either malicious or benign was occurring. I would then run additional searches to see what other alerts or events took place from the IP or hostname. This leads to the second issue of event-driven SIEMs.
When an alert fires, an analyst has to run additional searches to see what other activity is related. There is no relationship between alerts that occur from the same host. Every alert that comes in has to be investigated individually. Sixteen different alerts could fire from the same host throughout the course of a day, and as an analyst, I would have to make that relationship to the host a new alert triggered each time.
Let’s look at an example:
8:00 AM Monday → HostA -> Alert = User Clicked on phishing URL
8:00 PM Monday -> HostA -> Alert = Endpoint Detected New Process
2:00 PM Tuesday -> HostA -> Alert = Outbound Connection to Known IOC
An analyst investigates the first alert at 8:00 AM Monday on HOSTA (User clicked on phishing URL) by looking at everything to and from HOSTA for the past couple hours and makes the determination that with the data available nothing malicious has occurred. The second alert fires on HOSTA at 8:00 PM Monday when, after a shift change, a new analyst investigates the alert by looking at all events related to HOSTA and determines the new process was benign. Tuesday at 2:00 PM the next day, another analyst investigates the alert from HOSTA and comes to the same conclusion that nothing malicious has occurred. What is missing? The analyst has no idea how all these alerts are related. The only way to understand this is to manually connect the dots to gain a picture of how each alert relates to each other.
Now, one might say, an analyst can just expand their search time each time an alert triggers to a day to make sure they see all the related alerts. But what about multiple days – for example, if the alerts take place over a 10 day period? This starts to become resource-intensive in the SIEM and starts taking a lot more time for the analyst to work through their alerts.
How is this problem solved? Jack Crook Principal Incident Responder for GE-CIRT and finder of bad guys said it best in his tweet below “Increase confidence by alerting on clusters”
This is exactly how JASK solves the problem of alert overload. First, we have a concept of entities that can either be a user, IP or hostname. As alerts – or signals, as we refer to them at JASK – trigger they get related to a user, IP or hostname. Each time an alert enters the adaptive signals clustering engine, it looks back over the past 14 days (Configurable) of all the alerts associated with the entity. It looks at the uniqueness and severity of the alerts and will then generate an Insight when it reaches a certain confidence level.
What does it look like when an insight triggers inside the JASK system? Below, we can see the name of Insight: Exploit Delivery with Command and Control and Data Exfiltration. This name is auto-generated by the different stages of the MITRE ATT&CK framework that are associated with each signal. We can see there are 4 unique signals of 13 total signals related to the IP. We can also see that like signals or signals with the same name get grouped together and can be expanded upon when investigating. Lastly, we can see from when the first signal triggered on 8-29 to the last signal that caused the Adaptive Signal Clustering engine to trigger and insight on 9-11. That time span, or dwell time, was calculated as 13 days.
We also have the flexibility to create custom insights. If you know a particular alert/signal is bad every time then the Adaptive Signals Clustering Engine can be bypassed and can create an Insight every time that particular signal triggers. What is nice about that is when the custom insight is triggered it will still show all the other signals as it relates to the entity.
What JASK customers really like the most about the clustering is the flexibility it allows when creating detection logic. If they are not investigating every alert it gives them the ability to create detection logic that when triggered by itself it doesn’t mean much but when triggered with a bunch of other alerts/signals it can be meaningful. These alerts/signals might have been filtered out in the past to decrease alert volume. There is a new term that is going around the industry called weak signals. Steve Miller at FireEye described weak signals as “features” of intrusion activity and malware that are inherently non-evil but are uncommon or rare enough to be useful, and then combine those things in surface sets of activity that are especially unusual or interesting.
This is an example of a weak signal from a bro dns.log inside the JASK platform. Were looking for a high entropy based on the domainName and an alexaDomainRank. This is a good example of when combing this with other signals it makes it a perfect candidate for a weak signal.
In conclusion, one might be wondering just how effective clustering is versus being alert-driven? Below are some real numbers from a few recent POCs. We can see, there is no way to possibly investigate all 4.7 million signals/alerts, even if we had a small army! What has JASK seen as the best event types to cluster? A combination of OS logs, EDR/AV, Proxy, Bro, IDS/IPS. In the next blog, I will talk about a new feature called suppression that JASK just released which will make this ratio go down even more.