Learning from DNS

If you step back and examine the domain names that are observed in almost any network, you could roughly categorize them in three main pseudo-classes: dead, alive, and about to be born. In loose technical terms, dead are those domain names that are parked or simply get pointed to a sinkhole. The active ones are the ones that currently facilitate some sort of service (benign or malicious). Finally, the “about to be born” domain names reflect what humans or automata (malicious or not) are requesting but they are currently not registered, and do not resolve. 

In this blog, I will examine the aspects of the “about to be born” domain names (or simply NXDomains). You can view NXDomains in the following way: There is an unknown person outside your residence that is waiving a loaded gun – occasionally it is pointing towards your house. Personally, I would prefer to be able to detect and report this behavior as early as possible, so the suspicious actor can be stopped before we have any victims.

In the context of DNS, malware that employ domain name generation (DGAs) tactics, typically leave strong traces of NXDomain traffic. This is simply true because some of the domain names the malware queries could be registered, however most of them would not be. Wouldn’t it be nice if we could spot these suspicious NXDomain names and the DGA activity as a whole before it becomes too hard to deal with?

Pleiades, a machine learning system designed by Damballa Labs, can successfully detect the rise of DGA-based malware even in the absence of any malware samples, or previous knowledge of the actual DGA, or knowledge of an active DGA-related malicious domain name. This system has been recently published in USENIX Security 2012, https://www.usenix.org/conference/usenixsecurity12/throw-away-traffic-bots-detecting-rise-dga-based-malware. Here we thoroughly explain the mechanics behind our DGA detection methodologies and exposed a number of known and unknown DGAs.

During 2012 we saw several new DGAs on the rise. The latest, and perhaps the most interesting DGA was the new iteration of the TDSS/TDL4 click-fraud module https://www.damballa.com/tdl4/. During the summer of 2012, and in collaboration with the Georgia Tech Information Security Center (GTISC) http://www.gtisc.gatech.edu, Damballa Labs was able to enumerate a portion of the TDSS/TDL4 botnet by taking over some of the domain names used in the click-fraud DGA.



The report we released exposed the click-fraud DGA component used by the TDSS/TDL4 botnet. Damballa Labs still tracks the DGA and a few weeks ago we sampled a few more domain names from the same TDSS/TDL4 DGA. Sadly, it appears that the botnet is still quite active. According to the data we collect in the sinkhole, we have (up to this date) observed 422,000 unique IP addresses around the world successfully connecting to the sinkhole using the click-fraud protocols discussed in our white paper. We should note that the sinkhole data have been made available to the security community for remediation and to vetted academic researchers via SIE@SIE https://sie.isc.org (Channel 81).

Traditionally, DGAs have been used for C&C purposes, however cases like this demonstrate that DGAs could be used in other components of the threat lifecycle. For example, I had the pleasure to get some insight on some recent work that came out of UCSD’s security and cryptography research http://cryptosec.ucsd.edu/. The folks there led by David Wang, will present in the upcoming NDSS conference http://www.internetsociety.org/events/ndss-symposium-2013/ndss-2013-programme/ndss-2013-schedule-tuesday-february-26 a very interesting study under the name “Juice: A Longitudinal Study of an SEO Campaign”. The folks from UCSD discuss extensively the discovery and practices of an SEO botnet they named “GR”. One of the many interesting observations in their paper is that this SEO botnet employs a DGA as part of its directory service component.

Hopefully, I will be able to discuss this paper in depth in a new blog; however what we can observe is that DGAs tend to gain breadth with respect to the illicit activities they are used in, ranging from SEO to click-fraud and infector sites. This effectively means that being able to actually detect and track DGAs in your network is an important feature that can improve the overall security of the network and defend the organization against rising threats.

With the help of the security community, we are currently investigating three new DGAs on the rise. In collaboration with GTISC, we have taken similar actions as we did with the TDSS/TDL4 click fraud module. The three DGAs are code-named Cv6, Mv4 and Mv16.

The population that is reaching out to them at the moment is in the order of a few thousand “potentially” infected hosts. What is to be discovered at this point is the intention of the DGAs and related malware. For example, is Cv6 a new Zeus DGA variant? Why is the Mv4 DGA responsible for a more than fivefold increase of all NXDomain traffic observed in ISP networks? One thing is for sure – Mv16 is the first DGA that talks IRC on port 443!

Stay tuned…more to follow.


Manos Antonakakis
Senior Director of Research, Damballa Labs

Tags: , , , , ,