Threat Hunting with your hands tied – This is Big Data Part II


Threat hunting isn’t only about finding compromised assets, it’s also performing the predictive function of finding the holes a malicious attacker might take advantage of. As I mentioned last week, your customers are your best hunters, accessing your website in a million different ways, with a thousand different web browsers and hundreds of different types of devices. This doesn’t include the automated mass vulnerability scanners, such as Shodan or research projects like MassScan that are scrubbing your applications as well. Today I’ll share some of my queries and I hope you share some of your most recent hunting exercises and queries with me.

At JASK we utilize Hadoop and Zeppelin notebooks. This allows us to write functions in spark and query our data using spark-sql syntax. This also allows us to export notebooks in json to share with the security community, work with our customers and the threat hunting community to build even more powerful notebooks and applied research. Now onto the data.

Searching for DNS non-authoritative answers for customer domains:

The results showed a large number of hosts querying the internal DNS server for Example: The internal DNS server did not have a record for this, so the query would then be forwarded to an external DNS server. This looked strange and we realized this misconfiguration would point all users to their CMS licensing manager page since this particular domain was not registered under their license. I would categorize this as information disclosure, resulting in disclosing the CMS server version and dropping everyone to the admin login page of the CMS (both internal and external users). From this information disclosure it turns out they were running a vulnerable CMS version as well. Were they exploited yet? We had been in this POC for a few weeks and can query our data to determine if anyone accessed the CMS admin page while we have been in place. We are also able to close the loop and write a rule to produce a signal for logins to the admin page. Often times the business will decide this is not a risk and we simply keep it in our hunting notebook.

The zeppelin paragraph:

Building on the CMS information disclosure story we mentioned earlier. Here’s the query we used to perform a historical check and determine if anyone had accessed the vulnerable CMS.

Non-Standard software – User-Agents:

Most of the customers I’ve worked with function like the wild west, with BYOB and no managed software or hard and fast policies. Every now and again you get an easy one where the customer maintains an approved software list and possibly even an approved web browser. This makes for easy anomaly hunting or “Never have I seen X” type hunting. If we see anything that does not match the customers “approved” user-agent, we have a finding worth chasing. Below is a sample query, but usually you’ll add more to the query, an internal subnet to hunt or regex of acceptable user-agents. Below is a sample of a basic Zeppelin paragraph, I will leave the rest to your own imagination and hunting specific hunting exercise. Here we are looking for all IE 11 User Agents. This is to get your mind thinking, but this one is fairly simple for this post.

Maybe you just want to see what your TOP 10 Most popular User-agents are?

Maybe you just want the distinct User-Agents in your network? This query has found me anti-virus agents fetching update lists and validating the license key through a base64 encoded User-Agent string. Lame…

None of the above queries are all that efficient and depending on how tight lipped the network is the more clarity these queries can provide. Nesting queries can help clean the results and mean the difference between having a threat hunter analyze 100 results or 1,000’s.

Wasting your time searching for ad-trackers?

I’m not aware of what can be done here short of our government stepping in to protect our privacy and this hasn’t bore me much fruit in a hunt. It has found me people accessing inappropriate content in the workplace. Even while the organization had invested in a web proxy and end-point software to prevent adult content in the workplace. We could use this to validate the effectiveness of those automated content blocking tools and web proxies. Ad-tracker’s give up a lot of information about the quality of the website you are accessing and you just might find this query bearing fruit for you to find users searching websites in “poor” taste for the workplace. I find the more deceptive the ad-tracker, usually the dirtier the website. Here’s one of the most common ad-tracker’s I’ve seen recently.

Searching for plain text passwords floating around.

This one can be a bit noisy, so make sure to tighten it up after you scrub your first round of results with a few “not like” statements. We’ve found poor business applications with hardcoded passwords crossing the network boundary and floating around internally.

Searching for plain-text Protocols:

We all promise plaintext protocols are not allowed on the network, but we always find them. How about we take a look at the types of FTP activity happening and the exact commands that were run? One piece of information against logs for hunting. If you don’t control the FTP server, do you think the FTP server is going to send you the logs? This is the type of hunting that MUST be done with network data. Log data is a ho-hum source for hunting, maybe you have it, maybe you don’t. You just don’t know if you are getting the true results with logs, you never know which servers are logging. Sometimes the servers running are not yours, but a service a user throws up to get their job done quickly. That was the case with one of our most recent hunting exercises finding a quickly stood up FTP server on the internal network.

Maybe you are searching for anyone using those pesky Dell or IBM superfish root * certificates? This is just a dabble into the power of hunting based on TLS certificates, the cipher being used, and more. I’ve yet to find anything in a customer network related to weak ciphers or export encryption and that’s a good sign. TLS parameters are easy to hunt for and you should do it. It’s not always about what your certificates look like, but the certificates of the sites your users are interacting with. This might be the case with encrypted malware and TLS encrypted botnets using self-signed certificates or misconfigured certificates. Hackers make mistakes and it’s your job to catch their mistakes. They are doing a good job at catching ours.

The story goes on forever, are you focused on the perimeter and want to see any connections that were established from external to internal? We remove RFC 1918 space in this query. As we graduate our knowledge in Spark we begin to define variables utilize functions, but for this article you’ll see no variables are used and we simply code the customer’s used RFC 1918 private addresses into the query.

Still loving DNS and want to see your top 10 DNS queries? Your domain will likely be the top hit, go ahead and set it as a “Not like” and keep paring down those not like statements for a personal fit. Remember this is a write once, run many times hunt. Investing your time to write good queries the first time will result in a more efficient and quicker hunting exercise in the future.

Have any ugly buggers trying to perform DNS exfiltration? Try searching for DNS queries of long length. This is a pretty weak one and almost every hit ends up with spotify’s long DNS queries for playlists.

Weak Kerberos Ciphers?

RC4-hmac and DES are seen on Windows XP and up to Windows 2003 servers. It’s something most environments should be moving away from for obvious weak cipher reasons. This query is great for validating strong ciphers are used throughout an environment and calculating the risk associated with where these weak ciphers are occurring in your network.

Finally, let us not forget the world of executables. Those hundreds of thousands of dollars spent on full packet capture devices for the sole business purpose of extracting executables. Save yourself:

That’s a small sample of the 100’s of queries, paragraphs, and notebooks we’ve built at JASK for our customers to jump right into hunting in Big Data. We prefer to organize these queries into focused notebooks, such as DNS Security, HTTP, and TLS notebooks and run them at the notebook level vs. paragraph level, adding tremendous value and efficiency to a threat analytics program.

What to do with the results and wrapping up the Hunting Exercise.

Results are nothing if you can’t wrap them into the business process. When the hunting exercise is complete, take your query and turn it into signal intelligence to drive Artificial Intelligence. In JASK we have a rule engine for this exact design. Teach JASK a new skill and the AI becomes smarter. No security detection technology will catch everything, but when humans, customers, the data science, and security community are able to continually improve detection through hunting exercises and close the loop, we are one step closer to defending the business and turning hunting exercises into a repeatable process.

Happy Hunting!



Share on