|=---------=[ DNS Covert Channels and Bouncing Techniques ]=-------------=| This article describes a new type of covert channel in the DNS protocol and its application as a communication scheme for an Internet worm. It is divided into two parts: Part 1 is a description of the NCACHE channel, with a summary of standard DNS behavior, introduction of the communication method, benefits and limitations. Part 2 discusses how this scheme could be used for co-ordination between worm instances, potential for being detected and other enhancements. Table of contents 1. NCACHE Channel Description 1.1. Introduction 1.2. DNS Behavior 1.3. Communication Details 1.3.1. Advantages 1.3.2. Choosing a Good Domain and Subdomains 1.4. Potential Issues 1.4.1. Efficiency 1.4.2. Synchronization 1.4.3. Reliability 2. Application for Worm Communication 2.1. More on Choosing Good Domains 2.2. What Kind of Data is Useful? 2.3. Detection 2.4. Closing Notes 3. References PART 1. NCACHE Channel Description ---[ 1.1 Introduction In the last 15 years several covert channels in DNS have been found [1], and utilities such as nstx [3] and DNShell [4] have been developed for communication between hosts by using UDP packets with DNS queries. By creating regular resource records (type TXT, A or any other) with information disguised as legitimate RR content, it is possible to communicate while maintaining the illusion that our packets represent innocent DNS queries. However, most of the methods proposed require that the parties which want to engage in the communication have authority over a DNS zone and run a DNS server. For many purposes this is not acceptable, because it allows to trace the communication back to the owner of the domain, and it provides the communicating parties with no benefits if DNS traffic is monitored or if a NIDS is deployed. Also, most covert channels in established protocols rely on overloading the meaning of a certain portion of the protocol (Sequence number field in TCP, TXT resource record in DNS), and they still require direct data flow between communicating hosts. For an administrator or reverse engineer the mere fact that a suspiciously behaving host is exchanging traffic with another machine is a strong hint that they are both infected with some kind of malware. One notable exception to this is an interesting idea touched upon by Dan Kaminsky at Black Ops 2004 -- "DNS Cache Modulation" [1]. In short, it relies on two hosts querying a given DNS server for the same domain and distinguishing based on whether the server replied using information from its local cache. Unfortunately, the way it was originally described made it unnecessarily complex and limiting. In this article I build upon this idea and describe a new covert channel using DNS negative caching (NCACHE). The NCACHE channel uses the DNS infrastructure to pass messages and does not require the hosts to communicate directly. It allows a worm to co-ordinate all hosts on one network (for our purposes a network is a group of hosts using the same DNS server or multiple DNS servers with a shared cache) solely by issuing valid DNS queries to the ISP's servers. ---[ 1.2 DNS Behavior Whenever a client machine wants to connect to an Internet host (such as www.google.com) it queries its local DNS forwarding server for the address of this host. When the forwarding DNS retrieves this information from an authoritative server, it caches this information for the number of seconds specified by the zone administrator in the TTL (time-to-live) field. All queries within the next TTL seconds will be satisfied from the cache; as part of the response the local forwarder will also include the number of seconds until the record expires from the cache. This behavior, designed to improve performance, is also applied to queries for non-existent domains. If a person mistypes a URL, the NXDOMAIN answer received by the authoritative server will also be cached by the forwarder so other people who make the same mistake will be immediately notified of it. Negative caching was introduced as optional by RFC 1034, but made its way to the official standard via RFC 2308 which specifically states that any resolver which maintains a cache, must also cache negative answers. The TTL for a non-existent domain is controlled by its parent zone. For almost all TLDs that parameter is either 3600 or 10800 seconds (one or three hours), i.e. if we query for nxdomain-1.com or nxdomain-1.nxtld, our local DNS will cache the result of our query for 10800 seconds. We can easily check for the presence of a particular domain in the cache by issuing a non-recursive query to the forwarder. If the answer we get is a referral (it contains the NS records for one of the parents of the queried zone), we know that the information we seek isn't cached: no-one has queried for this particular domain in the last TTL seconds. When querying for a non-existent domain this process is even easier: if the domain is in the cache, the answer status is NXDOMAIN. Otherwise, the answer will be a NOERROR, accompanied by several NS records for the parent zone. ---[ 1.3 Communication Details The gist of this scheme is as follows: two hosts on the same network query for the same domain which is known to them not to exist. If the second host queries for the domain while it is still in the cache, it will notice this fact and will infer that it has someone to talk to on its network. If the second host uses a non-recursive query, it will not change the state of the cache and thus make it possible for other hosts to receive this 'message' in a similar manner. By querying a previously agreed-upon set of subdomains of this non-existent domain the hosts will be able to covertly exchange messages, treating each cached subdomain as a '1' bit, and non-cached domain as '0'. While this isn't a particularly efficient communication method (we can only send/receive 1 bit per query), it has some surprising applications which we will shortly discuss. ---[ 1.3.1 Advantages There are three main advantages of this method: 1. Neither host needs to know the address of the other, nor do the hosts need to agree on a particular server to use for the communication - they will both use the default DNS server(s), which in most cases is the one maintained by their ISP. 2. Communicating hosts never directly exchange any traffic, all work is done by issuing DNS queries. What follows is that a host can make it known to an entire network that it is willing to talk, while never sending anything but a query to the DNS server. 3. In contrast to other covert channels in DNS, we do not need to maintain a domain or run a DNS server; we make use of the existing infrastructure maintained by our ISP. ---[ 1.3.2. Choosing a Good Domain and Subdomains The set of subdomains for which a query is issued should be automatically derived by each host, so the number of bits that can be set isn't limited. Some possible choices are: 00.nxdomain1, 01.nxdomain1, 02.nxdomain1 ... aaa.nxdomain2, aab.nxdomain2, aac.nxdomain2 ... www-1.nxdomain3, www-2.nxdomain3, www-3.nxdomain3 ... There are several fine points to selecting the right (sub)domains. We must make a trade-off between avoiding attention to our queries and assuring that a particular domain is invalid (by "invalid" we mean that doesn't exist and that it is a child of a TLD or the root domain). While choosing "bogus.foo" might seem tempting, a particularly astute sysadmin who happens to monitor DNS traffic could expose our channel. On the other hand, if we query for subdomains of "go-ogle.com" we might find that this domain exists and other users on the network frequent this website. While not necessarily devastating for our scheme, this would introduce three problems: 1. Legitimate traffic could interfere with our communication making it seem like there is a message set, when there really is none. 2. The DNS administrator for the queried zone could notice an increase in the number of queries for its members from our network and expose our channel. 3. The DNS administrator for the queried zone could set the MINIMUM field in the SOA record to 0, thus disabling caching for the zone and disabling our channel. A reasonable idea might be to try subdomains similar to "www-03.ib.mcom", which exploits the naming scheme of IBM public webservers and also assures that no such host will exist on the Internet. Of course a sudden spike of queries for .mcom domains wouldn't go unnoticed for long... Side note: for a particularly paranoid individual with an aversion to NXDOMAIN responses we can modify our scheme to do one of the following: 1. Query for one or more obscure, but legal domains which are unlikely to get traffic from our network (hosts with most two-letter TLDs, with the exception of some famous domains in .cx). 2. Query for some .in-addr.arpa domains. 3. Query for subdomains of a known wildcard-enabled zone, such as "linux01.slashdot.org" Communication using those alternative methods would be similar, but could exhibit a bit different properties and introduce new challenges. For the sake of brevity I will stick to the NCACHE channel in this article. ---[ 1.4 Potential Issues Before we describe the usefulness of this channel, I will make one last attempt to discourage the reader by listing the limitations of our scheme. ---[ 1.4.1 Efficiency Each query allows us to send/read only one bit. With approx. 50 bytes per query and 100 bytes per DNS answer, we will need to push 2.4MB data through the network to send and receive each kilobyte of useful information. If we want to post our IP address in the cache we will need to generate 32 packets with DNS queries. ---[ 1.4.2 Synchronization It is relatively easy to post our message in the cache and for other hosts to receive it. However, for two-way communication we would need to have a 'semaphor' domain which another host would set when it has a message. We could then proceed to query the predefined subdomains of the semaphor domain. The catch is that we won't know immediately that the semaphor domain is set, so we will have to nonrecursively query for it at regular intervals to see if there is a message waiting for us, which will increase the traffic we generate and might give us out. ---[ 1.4.3 Reliability One problem is that the channel is vulnerable to several race conditions. If Host A wants to broadcast its message for an extended period of time it will have to resend all its queries in exactly TTL-second intervals to populate the cache. If Host B happens to issue a query before Host A renews the message, it won't see the information in the cache, or worse, assume it can broadcast information using the same set of domains, therefore creating a mangled uninterpretable message. It's possible to safeguard against that by using two 'control' domains with a slight time shift, but one generally has to be wary of such details. Another potential problem would occur if the DNS server couldn't meet the requirement to cache the record for the entire TTL period. This could happen if it was flooded with queries, or just poorly implemented. DNS administrators and server programmers try not to let that happen, and it usually doesn't, so we'll assume that the server we use is sane and conforms to the standard. Also, this method of communication probably won't withstand reverse engineering of any of our clients on a network. Once the details of our protocol are known, an individual can mangle our message by setting all our bits to '1' by querying all subdomains in our set. If we so desired, we could probably create a "mirrored set" of subdomains with opposite information to enable a reliability check, i. e.: a.nxdom = 1; b.nxdom = 0; c.nxdom = 1; d.nxdom = 1; m-a.nxdom = 0; m-b.nxdom = 1; m-c.nxdom = 0; m-d.nxdom = 0; We could then proceed to XOR both messages and assume the message is correct if all bits are set to '1'. This would introduce unnecessary complexity, increase the amount of traffic we generate and would be utterly useless if an administrator decided to purge the DNS cache. In other words, for some with the right mindset to implement this entire communications scheme, it will probably make perfect sense to include this 'feature' as well. ---[ 2. Application for Worm Communication The ability to make our message visible to all hosts on a network without having to establish direct connections with them provides us with a way to communicate in a more subtle fashion than what is being observed in the wild. Currently, distributed malware writers who want to execute commands on controlled hosts typically communicate via IRC networks to which every infected host connects and awaits orders. However, as such malware grows more sophisticated [5], we will probably see attempts to create less ostentatious control mechanisms, i.e. ones that generate less traffic and are more covert. Let's see how we can use the NCACHE channel for such a purpose. Suppose a worm infects one host on a given network. After hiding its presence in the OS and maybe patching the vulnerability it exploits, it will nonrecursively query its local DNS for the set non-existent domain to check if it has any peers on this network. If so, it will query the predefined set of subdomains to retrieve a message left by its colleague. If not, the worm will assume that it's the only worm of its kind on the network, recursively query for the control domain, and then set a message using its subdomains, becoming a "master worm" which will co-ordinate the actions of other worms on the network. ---[ 2.1 More on Choosing Good Domains One more caveat regarding choosing an appropriate domain name: it should be set according to our guidelines from Part 1, but we would definitely want to avoid hardcoding the same control domain into all copies of the worm. If all our worms used the same domain name, sooner or later an administrator or reverse engineer would discover it, which would immediately blow our cover globally, and allow for the creation of simple NIDS rules to detect our presence. Therefore all worms on a network should utilize a part of network-specific data in the construction of the control domain. This could be the network address, DNS server IP (although we need to be careful because almost all ISPs have several DNS servers with a shared cache), the ISP name or any variation thereof. This will make it possible for any worm on the network to get this data, but can't be automatically detected by a NIDS. A rather straightforward example: if we're on the 172.16.0.0/16 network we can set the control domain by expressing the first two octets as hex numbers and get "ac10.org" as the control domain. This particular naming scheme would have a side effect of dividing all hosts using one DNS server into groups depending on their subnet, which might not be desirable in some cases. It is of course possible to obfuscate the control domain name even more, while making it seem more believable; this is left as an exercise to the reader. ---[ 2.2 What Kind of Data is Useful? Assuming that we have an inconspicuous control domain and a believable subdomain naming scheme, what kind of data would we like to transfer? Again, we have to make a trade-off between the size of the message and the risk of being detected. Although theoretically we could post entire patches or diffs to our code, this would be a sure way to attract attention. A better idea might be for the master worm to post its IP address, or maybe just the last [32 - subnet mask] bits of it. Other worms could then connect to the specified port (the port could be dynamic and also posted in the DNS cache). This would be an interesting way for the master worm to obtain the list of infected hosts on its network without broadcasting messages directly to each host. For the more paranoid, it might suffice to set flags which would be understood by other worms. The presence of a given subdomain in the cache could mean "do not infect any other hosts on my network", "remain hidden" or "do your deed, minions!". This last concept leads us to a final point in this section, which is the ability of the worm writer to manually trigger a certain action of a worm. Assuming he knows the control domain used by his worm on a certain network, he can issue a query for a subdomain, which would prompt the worms to perform a certain action. It is also conceivable that similar interaction could occur between separate networks with worms on one looking up another ISP's DNS addresses and communicating across network boundaries. ---[ 2.3 Detection At the moment there are no NIDS capable of detecting this method of communication on a network. If the domain and subdomains are picked sensibly, each packet on its own will seem completely benign. What could attract attention is a sudden increase in the volume of DNS traffic, and especially NXDOMAIN replies (although using a wildcard domain or some .in-addr.arpa queries would result in NOERROR replies). Since all the replies we get are eventually removed from the cache, the only way to try and reconstruct a message would be to keep a log of all DNS requests and look for patterns. This could be based on the number of queries for a particular domain and the intervals between queries (the host setting a message will have to resend its queries every TTL seconds), but would still return a lot of false positives. The way we described it, our scheme would also be vulnerable to detection due to one characteristic: we query for a domain, and even after receiving an NXDOMAIN response we query for its subdomains. This is obviously troubling, and no sane program should behave like that. It's possible to work around that, by querying the subdomains of a different domain than the control domain. We lose the benefit of being sure that the parent of the used subdomain doesn't exist, but if we picked the name wisely, it should not be a problem. ---[ 2.4 Closing Notes As mentioned in Part 1, we do not need to rely on the NCACHE behavior to implement such a scheme: it might be interesting to experiment with wildcard domains or .in-addr.arpa lookups. Also, we're not confined to the DNS infrastructure: almost any system which caches queries (think Web proxies) can be used for a similar purpose. And while a widespread worm using co-ordination mechanisms is yet to be written, you might just want to start logging the requests to your company's DNS server... ---[ 3. References [1] http://www.cs.ucl.ac.uk/staff/M.Rogers/kaminsky.html [2] http://www.dns.net/dnsrd/rfc/index.html [3] http://nstx.dereference.de/nstx/ [4] http://www.klake.org/~jt/dnshell [5] http://www.schneier.com/blog/archives/2005/06/attack_trends_2.html [6] http://blanu.net/curious_yellow.html [7] ftp://ftp.rfc-editor.org/in-notes/rfc1034.txt [8] ftp://ftp.rfc-editor.org/in-notes/rfc1035.txt [9] ftp://ftp.rfc-editor.org/in-notes/rfc2308.txt [10] http://www.digitalsec.net/ |=[ EOF ]=---------------------------------------------------------------=|