On the Hunt Part 2: Identifying Spear-Phishing Recon Activity-Collection of User Details with Ads for Spear Phishing Campaigns

A few weeks ago, I published a Base64 decoding article. The findings from this ranged from process ID numbers, application and version detection, to the blatant collection of email addresses. With that in mind, today I’m going to focus on Ads. Not adware, not malvertising, but just ads. Ads are the massive security hole in our network and the invasive species of our personal lives. I’m focusing on ads for the operational efficacy during the Reconnaissance Phase to support a strong spear phishing attack – Inspired by the Grizzly Steppe news.

I see targeted advertising happening day in and day out, but how much personally identifiable information is being collected about users? How much of the internet has become consumed by ads? How can we tell heads from tails or good from bad ads? I’m casting my net wide to catch what attackers might use as shrimp (bait) by spear phishing attackers later. With my morning coffee in hand and a comfy seat on the office couch, here is the notebook we are going to start with.

Base64 Decoding Emails

This is a slightly modified version of the Base64 Decoding Blog post from last week with a focus on decoded strings that have emails in them. (By adding a simple search for the “@” symbol in the decoded URI string.)

I would love to show you the results, but in the interest of protecting users emails, I’m going to ask you to trust me, there are hits. I’m not saying buckets of emails are leaving the network, but a handful of the most clearly identifiable pieces of information ([email protected] and [email protected]) are being collected by marketing and ad agencies around the world (KR, SG, US).

What information can you glean from an email? I’m easily able to identify where all of these people work based on the email domain. With the gmail, yahoo, and other webmail addresses I’m also able to identify login portals to imitate when I aim my exploit and go spear phishing. With emails in hand, we’ve proven the first step that emails are being collected by ads. Remember, this is being done at the ad agency level to supply their customers with perfectly targeted ads and to learn as much about the customer as possible to best target an ad. Only good things come from targeted ads.

Based on the data being collected by the ad agency, I as an attacker/customer can request the ad agency target everyone that works at (Company X) with my exploit. Job complete, why am I even writing this post? Why do attackers even waste time with Recon? Ad agencies do real-time human tracking as a core competency and business. They track and categorize the human across mobile, TV, and PC. They are really good at what they do.

Maybe I’m being paranoid. I need more facts to backup my previous thoughts of *innocent ad companies tracking everyone. Maybe users are traversing the dark web or risky websites that would host malicious ads and I’m making assumptions. Maybe, it’s one country targeting users, like Russia? How hard is it to gain attribution? Let’s go to our data:  

Top 10 Non-US Traffic Destinations:

You can clearly see a large portion of traffic is going to KR (South Korea) and you might praise yourself, “Aha, the South Koreans are after me! It really was my users going to risky websites”. Don’t – Ask the data please: Let’s pivot to the notebook for suspicious countries and analyze http requests. For this we are going to leverage our URL parsing UDF and parse out the domain name for quick and easy viewing of what all this KR traffic is about.

What is The Traffic to South Korea.

Traffic to South Korea Query:

100% of my South Korea traffic is going to Yahoo.

The data shows users weren’t going to risky websites when they went to South Korea (They went to Yahoo). What the data supports is that the internet is a global service and ads are hosted from around the world. Putting this together, you start to see why many of the IOC’s in the Grizzly Steppe release are all over the world and from trusted sources, It’s impossible to gain attribution to an attack source based wholly on network traffic GEO source or destination, notice Russia is nowhere to be scene and I actually haven’t seen as much Russian traffic lately as I had in previous years.

As an attacker, I could leverage an ad or marketing agency to pinpoint exactly who I wanted to target. The ad is not malware, it’s not malicious (I’ll debate an ad tracking me from my phone, to my PC, to my TV as being malicious), but it’s not illegal, unfortunately. It’s highly efficient Reconnaissance and attackers will and should be take advantage of this service.

Why port scan, URL crawl, or use Recon-ng, when I can pay an ad network to supply me with everything I need? An attacker can sit quietly on the sidelines prepping his exploit and hiring out the recon to ad agencies. This makes the discovery of any Reconnaissance Phase difficult, an attacker can now jump straight to the Delivery Phase of the Kill Chain by leveraging ad agencies.

Expanding on the idea of a global internet and ads being hosted around the world. What is the main source and subject of international traffic? What is Yahoo delivering to me from South Korea? I live in San Francisco, why does so much of my traffic get processed and delivered by another country? What is coming from these international locations? To help answer these questions:

Top 10 Non US HTTP Domains


Top 10 Non-US HTTP Domains

Ads, ads, ads, and more ads. smartadserver, lijit, stickyadstv, adsrvr, bluekai, google-analytics, all of it ads. This new internet is depressing me. It seems the majority of international traffic is ad networks.

In business, these collected emails, user-id’s and application version detections are used to display “relevant” ads for things I’m never going to buy, but thanks for your effort. In an attack scenario, that same data will be used to determine what sites Daniel visits regularly, where he has other accounts at, and general awareness of his lifestyle. With that information in hand, my spear phishing campaign is beginning to look closer to spear fishing in a stocked pond.

Let me step back to Grizzly Steppe and shared hosting. Many of the IOCs in Grizzly-Steppe report were on domains like yahoo, BlueOcean, and multi-tenancy platforms. Does our data tell us a story about this?

Top 10 Destination Organizations:

Top 10 Destination Organizations:

The Top 10 Destination Organizations are advertising providers and are ALL the major platforms for advertising, lead generation, and marketing providers. AppNexus is hosting adnxs, Google is googlesyndication, Amazon is hosting springserve, Akamai is fronting taboola. Thinking about this it’s not a surprise. Every webpage has a dozen ads, so the ratio of good clean internet traffic vs ads gets washed out – come to think of it, I could ask my data for the real answer of average connections per webpage request and maybe that’s a good indicator of risky websites? Ah, I’m an idea machine that never stops producing ideas, I’m going to work on that, but at some point this blog post must end, because it’s Friday and my coffee is now cold and not in a good cold-brew type of way.

What is possible with targeted ads?

Outside of Grizzly Steppe and the DNC attack, let’s bring this closer to home with a real world example. If you work at any publicly traded company, it’s predictable that employees of the company will go to or to check the company’s stock ticker and see how the shares of the company are doing. Sounds reasonable.

I’m going to hire the ad agency to target my ad with a few items – Users that currently have a vulnerable version of an application running (detected by the ad-agencies PID and application version detection). Target only users that work at the target organization. The ad agency knows who I work for, because they’ve tracked the websites I’ve visited for months and years as they host ads on most of the internet, think facebook ads, but also based on the src_ip.address GEO organization. Further pinpoint this ad directly to [email protected], because you’ve captured his email through a previous ad-campaign. Now tie all of these meta-data pieces together and fire off my ad. I could also reverse the ad campaign for a single targeted email and request to give me the Sales VP’s email who works at Jask. The ad agency has a collection of emails at the target organization and has auto-enhanced these emails with the person’s title and importance within an organization with a site such as Linked-In.

How about an email that reads similar to this one? “This month’s ESPP paperwork needs to be electronically-signed, please login to the link provided or open the attachment, sign, and respond in order to approve this quarters shares. This must be completed by Friday as we did not receive your response to our previous email.”  

We target this email at users within the target organization with a known application weakness (from the ad agencies collection of running processes). Sip some coffee….get a little anxious….sip some more coffee…profit.


Share on