Beyond SIEM: Evolving Correlation


Most SIEMs have a correlation engine where logic/rules are written against flows and logs to bring them to an analysts’ attention. LogRhythm has their AI Engine Rules, ArcSight has correlation rules, IBM QRadar has both rules and offenses and, Splunk has their saved searches/notable events. Some correlations engines are more powerful than others, but most of the time they all still provide the same level of functionality in one way or another. These events usually get triaged by an analyst one-by-one, where they are reviewed and investigated. If it is determined to be a false positive, they tune the rule, or in large SOCs, provide feedback to a content developer to tune the rule. It is a very manually process that needs to be consistently measured to make sure rules are actionable.


There are a few different types of rules that will usually be deployed in SIEMs. They are as follows:

A basic rule can be written to match one or two conditions in the log. For example, you could write a rule matching deviceVendor=Snort and deviceSeverity=High. This would trigger a correlation rule every time there is a high severity snort event. Now, I wouldn’t really call it a true correlation and would not recommend this rule unless the SOC has a good tuned IDS/IPS. In many cases, this type will make up the majority of SIEM rules.

A more complex rule would be if one event happens followed by another type of event then make an alert, where the correlation engine maintains state across multiple events. An example of this would be a proxy event where the URL ends with or has the content types/MIME types of = (.exe,Doc,.jar,.vb,etc). Then, within a period of time (usually 5 to 10 minutes) from the same source that downloaded the file, there is a IDS/IPS outbound malware event. This is a little bit more complex than the simple rules, and in some SIEMs you might have to use a list to maintain state, as the rule/correlation engine will not be able to on its own.

There are also building block rules where, in itself, does not warrant triage, but if combined with another variable, it then requires triage. An example of this is an external port scan. This in itself does not need to be triaged every time, but if there is an external port scan followed by an outbound connection to the source IP that performed the scan (for example, it is showing an outbound connecting that should not have been allowed), then his type of activity could be an indicator of zero day behavior that most IDS/IPS will not have signatures for yet. These building block rules can also be used to start doing very basic UBA. An example would be every time a user logs into a device is to have a rule that adds the username and IP to a list then have another rule or the same rule depending on the correlation engine do a lookup and if the username and IP is not in the list fire a rule indicating it is the first time the user has accessed the device. These lists become hard to maintain over time as usually there is limits to how many rows can be stored before there is performance impacts to the correlation engine.

Where SIEM correlation engines fall short is in the  lack of a built-in feedback mechanism to make the  correlation get smarter.  SIEM correlations are not good at holding state for long periods of time. Most of the time the engines are not able to tie correlations to entities (Users, IPs, Host), making it a very manual process for the analyst in his or her investigative process.  Lastly, baselining user activity is near impossible, which is why the whole UBA market was established.

The JASK ASOC platform includes an anomaly detection engine that it baselines network and user activity and generates a signal that gets fed into JASKs ML/AI engine. JASK also uses patterns that are similar to rules, but instead of alerting an analyst, they create a signal which is fed into JASK’s ML/AI engine.

There are a couple of differences in the engine compared to traditional SIEM engines. First, JASK collects session data (Layer 7) from its network sensor. This allows for writing patterns on data that is not seen in traditional SIEMs. It also has built-in YARA pattern capability, which is nice for a modern SOC. Each time a pattern or anomaly detection fires, it is called a signal. All the signals are tied to entities giving JASK a combo of UBA and SIEM, which eliminates the need to have two separate products.  Lastly, at a high level, the AI/ML engine is using supervised learning and is looking to draw connections between the insights (collection of related signals with interesting security context) we’ve seen and provided feedback on in the past and the current insights.

These insights bring all related context and signals to the forefront for the analyst, telling the story of what took place. The signals could be days or weeks apart and the JASK platform is able to keep track that the signals are related.  Most SIEMs cannot keep track of related correlations and the analyst has to piece together the story by running additional searches. Analysts must provide feedback and scoring of the insight before closing out. This feedback loop is missing in most SIEMs and should be thought of when looking to replace a SIEM.

Share on