|=---------=[ DNS Covert Channels and Bouncing Techniques ]=-------------=|

    This article describes a new type of covert channel in the DNS protocol
and its application as a communication scheme for an Internet worm. It is
divided into two parts: Part 1 is a description of the NCACHE channel,
with a summary of standard DNS behavior, introduction of the communication
method, benefits and limitations. Part 2 discusses how this scheme could
be used for co-ordination between worm instances, potential for being
detected and other enhancements.

Table of contents

1. NCACHE Channel Description
        1.1. Introduction
        1.2. DNS Behavior
        1.3. Communication Details
                1.3.1. Advantages
                1.3.2. Choosing a Good Domain and Subdomains
        1.4. Potential Issues
                1.4.1. Efficiency
                1.4.2. Synchronization
                1.4.3. Reliability
2. Application for Worm Communication
        2.1. More on Choosing Good Domains
        2.2. What Kind of Data is Useful?
        2.3. Detection
        2.4. Closing Notes
3. References


PART 1. NCACHE Channel Description

---[ 1.1 Introduction

    In the last 15 years several covert channels in DNS have been found [1],
and utilities such as nstx [3] and DNShell [4] have been developed for
communication between hosts by using UDP packets with DNS queries.

    By creating regular resource records (type TXT, A or any other) with
information disguised as legitimate RR content, it is possible to
communicate while maintaining the illusion that our packets represent
innocent DNS queries. However, most of the methods proposed require that
the parties which want to engage in the communication have authority over
a DNS zone and run a DNS server. For many purposes this is not acceptable,
because it allows to trace the communication back to the owner of the
domain, and it provides the communicating parties with no benefits if
DNS traffic is monitored or if a NIDS is deployed.

    Also, most covert channels in established protocols rely on overloading
the meaning of a certain portion of the protocol (Sequence number field
in TCP, TXT resource record in DNS), and they still require direct data
flow between communicating hosts. For an administrator or reverse engineer
the mere fact that a suspiciously behaving host is exchanging traffic with
another machine is a strong hint that they are both infected with some
kind of malware.

    One notable exception to this is an interesting idea touched upon by
Dan Kaminsky at Black Ops 2004 -- "DNS Cache Modulation" [1]. In short, it
relies on two hosts querying a given DNS server for the same domain and
distinguishing based on whether the server replied using information from
its local cache. Unfortunately, the way it was originally described made
it unnecessarily complex and limiting.

    In this article I build upon this idea and describe a new covert
channel using DNS negative caching (NCACHE). The NCACHE channel uses the
DNS infrastructure to pass messages and does not require the hosts to
communicate directly. It allows a worm to co-ordinate all hosts on one
network (for our purposes a network is a group of hosts using the same DNS
server or multiple DNS servers with a shared cache) solely by issuing
valid DNS queries to the ISP's servers.

---[ 1.2 DNS Behavior

    Whenever a client machine wants to connect to an Internet host (such as
www.google.com) it queries its local DNS forwarding server for the address
of this host. When the forwarding DNS retrieves this information from an
authoritative server, it caches this information for the number of seconds
specified by the zone administrator in the TTL (time-to-live) field. All
queries within the next TTL seconds will be satisfied from the cache; as
part of the response the local forwarder will also include the number of
seconds until the record expires from the cache. 

    This behavior, designed to improve performance, is also applied to
queries for non-existent domains. If a person mistypes a URL, the NXDOMAIN
answer received by the authoritative server will also be cached by the
forwarder so other people who make the same mistake will be immediately
notified of it.

    Negative caching was introduced as optional by RFC 1034, but made its
way to the official standard via RFC 2308 which specifically states that
any resolver which maintains a cache, must also cache negative answers. The
TTL for a non-existent domain is controlled by its parent zone. For almost
all TLDs that parameter is either 3600 or 10800 seconds (one or three
hours), i.e. if we query for nxdomain-1.com or nxdomain-1.nxtld, our local
DNS will cache the result of our query for 10800 seconds.

    We can easily check for the presence of a particular domain in the
cache by issuing a non-recursive query to the forwarder. If the answer we
get is a referral (it contains the NS records for one of the parents of the
queried zone), we know that the information we seek isn't cached: no-one
has queried for this particular domain in the last TTL seconds. When
querying for a non-existent domain this process is even easier: if the
domain is in the cache, the answer status is NXDOMAIN. Otherwise, the
answer will be a NOERROR, accompanied by several NS records for the parent
zone.

---[ 1.3 Communication Details

    The gist of this scheme is as follows: two hosts on the same network
query for the same domain which is known to them not to exist. If the
second host queries for the domain while it is still in the cache, it will
notice this fact and will infer that it has someone to talk to on its
network. If the second host uses a non-recursive query, it will not change
the state of the cache and thus make it possible for other hosts to receive
this 'message' in a similar manner.

    By querying a previously agreed-upon set of subdomains of this
non-existent domain the hosts will be able to covertly exchange messages,
treating each cached subdomain as a '1' bit, and non-cached domain as '0'.
While this isn't a particularly efficient communication method (we can only
send/receive 1 bit per query), it has some surprising applications which we
will shortly discuss.

---[ 1.3.1 Advantages

There are three main advantages of this method:

1. Neither host needs to know the address of the other, nor do the hosts
   need to agree on a particular server to use for the communication - they
   will both use the default DNS server(s), which in most cases is the one
   maintained by their ISP.

2. Communicating hosts never directly exchange any traffic, all work is
   done by issuing DNS queries. What follows is that a host can make it
   known to an entire network that it is willing to talk, while never
   sending anything but a query to the DNS server.

3. In contrast to other covert channels in DNS, we do not need to maintain
   a domain or run a DNS server; we make use of the existing infrastructure
   maintained by our ISP.

---[ 1.3.2. Choosing a Good Domain and Subdomains

    The set of subdomains for which a query is issued should be
automatically derived by each host, so the number of bits that can be set
isn't limited. Some possible choices are:

00.nxdomain1, 01.nxdomain1, 02.nxdomain1 ...
aaa.nxdomain2, aab.nxdomain2, aac.nxdomain2 ...
www-1.nxdomain3, www-2.nxdomain3, www-3.nxdomain3 ...

    There are several fine points to selecting the right (sub)domains. We
must make a trade-off between avoiding attention to our queries and
assuring that a particular domain is invalid (by "invalid" we mean that
doesn't exist and that it is a child of a TLD or the root domain). While
choosing "bogus.foo" might seem tempting, a particularly astute sysadmin
who happens to monitor DNS traffic could expose our channel. On the other
hand, if we query for subdomains of "go-ogle.com" we might find that this
domain exists and other users on the network frequent this website. While
not necessarily devastating for our scheme, this would introduce three
problems:

1. Legitimate traffic could interfere with our communication making it seem
   like there is a message set, when there really is none.

2. The DNS administrator for the queried zone could notice an increase in
   the number of queries for its members from our network and expose our
   channel.

3. The DNS administrator for the queried zone could set the MINIMUM field
   in the SOA record to 0, thus disabling caching for the zone and disabling
   our channel.

    A reasonable idea might be to try subdomains similar to
"www-03.ib.mcom", which exploits the naming scheme of IBM public webservers
and also assures that no such host will exist on the Internet. Of course a
sudden spike of queries for .mcom domains wouldn't go unnoticed for long...

Side note: for a particularly paranoid individual with an aversion to
NXDOMAIN responses we can modify our scheme to do one of the following:

1. Query for one or more obscure, but legal domains which are unlikely to
   get traffic from our network (hosts with most two-letter TLDs, with the
   exception of some famous domains in .cx).

2. Query for some .in-addr.arpa domains.

3. Query for subdomains of a known wildcard-enabled zone, such as
   "linux01.slashdot.org"

    Communication using those alternative methods would be similar, but
could exhibit a bit different properties and introduce new challenges. For
the sake of brevity I will stick to the NCACHE channel in this article.

---[ 1.4 Potential Issues

    Before we describe the usefulness of this channel, I will make one last
attempt to discourage the reader by listing the limitations of our scheme.

---[ 1.4.1 Efficiency

    Each query allows us to send/read only one bit. With approx. 50 bytes
per query and 100 bytes per DNS answer, we will need to push 2.4MB data
through the network to send and receive each kilobyte of useful information.
If we want to post our IP address in the cache we will need to generate 32
packets with DNS queries.

---[ 1.4.2 Synchronization

    It is relatively easy to post our message in the cache and for other
hosts to receive it. However, for two-way communication we would need to
have a 'semaphor' domain which another host would set when it has a
message. We could then proceed to query the predefined subdomains of the
semaphor domain. The catch is that we won't know immediately that the
semaphor domain is set, so we will have to nonrecursively query for it at
regular intervals to see if there is a message waiting for us, which will
increase the traffic we generate and might give us out.

---[ 1.4.3 Reliability

    One problem is that the channel is vulnerable to several race
conditions. If Host A wants to broadcast its message for an extended period
of time it will have to resend all its queries in exactly TTL-second
intervals to populate the cache. If Host B happens to issue a query before
Host A renews the message, it won't see the information in the cache, or
worse, assume it can broadcast information using the same set of domains,
therefore creating a mangled uninterpretable message. It's possible to
safeguard against that by using two 'control' domains with a slight time
shift, but one generally has to be wary of such details.

    Another potential problem would occur if the DNS server couldn't meet
the requirement to cache the record for the entire TTL period. This could
happen if it was flooded with queries, or just poorly implemented. DNS
administrators and server programmers try not to let that happen, and it
usually doesn't, so we'll assume that the server we use is sane and
conforms to the standard.

    Also, this method of communication probably won't withstand reverse
engineering of any of our clients on a network. Once the details of our
protocol are known, an individual can mangle our message by setting all our
bits to '1' by querying all subdomains in our set. If we so desired, we
could probably create a "mirrored set" of subdomains with opposite
information to enable a reliability check, i. e.:

a.nxdom   = 1; b.nxdom   = 0; c.nxdom   = 1; d.nxdom   = 1; 
m-a.nxdom = 0; m-b.nxdom = 1; m-c.nxdom = 0; m-d.nxdom = 0; 

We could then proceed to XOR both messages and assume the message is
correct if all bits are set to '1'.

    This would introduce unnecessary complexity, increase the amount of
traffic we generate and would be utterly useless if an administrator
decided to purge the DNS cache. In other words, for some with the right
mindset to implement this entire communications scheme, it will probably
make perfect sense to include this 'feature' as well.


---[ 2. Application for Worm Communication

    The ability to make our message visible to all hosts on a network
without having to establish direct connections with them provides us with
a way to communicate in a more subtle fashion than what is being observed
in the wild. Currently, distributed malware writers who want to execute
commands on controlled hosts typically communicate via IRC networks to
which every infected host connects and awaits orders. However, as such
malware grows more sophisticated [5], we will probably see attempts to
create less ostentatious control mechanisms, i.e. ones that generate less
traffic and are more covert.

Let's see how we can use the NCACHE channel for such a purpose.

    Suppose a worm infects one host on a given network. After hiding its
presence in the OS and maybe patching the vulnerability it exploits, it
will nonrecursively query its local DNS for the set non-existent domain
to check if it has any peers on this network. If so, it will query the
predefined set of subdomains to retrieve a message left by its colleague.
If not, the worm will assume that it's the only worm of its kind on the
network, recursively query for the control domain, and then set a message
using its subdomains, becoming a "master worm" which will co-ordinate the
actions of other worms on the network.

---[ 2.1 More on Choosing Good Domains

    One more caveat regarding choosing an appropriate domain name: it
should be set according to our guidelines from Part 1, but we would
definitely want to avoid hardcoding the same control domain into all copies
of the worm. If all our worms used the same domain name, sooner or later an
administrator or reverse engineer would discover it, which would
immediately blow our cover globally, and allow for the creation of simple
NIDS rules to detect our presence. Therefore all worms on a network should
utilize a part of network-specific data in the construction of the control
domain. This could be the network address, DNS server IP (although we need
to be careful because almost all ISPs have several DNS servers with a
shared cache), the ISP name or any variation thereof. This will make it
possible for any worm on the network to get this data, but can't be
automatically detected by a NIDS.

    A rather straightforward example: if we're on the 172.16.0.0/16 network
we can set the control domain by expressing the first two octets as hex
numbers and get "ac10.org" as the control domain. This particular naming
scheme would have a side effect of dividing all hosts using one DNS server
into groups depending on their subnet, which might not be desirable in some
cases. It is of course possible to obfuscate the control domain name even
more, while making it seem more believable; this is left as an exercise to
the reader.

---[ 2.2 What Kind of Data is Useful?

    Assuming that we have an inconspicuous control domain and a believable
subdomain naming scheme, what kind of data would we like to transfer?
Again, we have to make a trade-off between the size of the message and
the risk of being detected.

    Although theoretically we could post entire patches or diffs to our
code, this would be a sure way to attract attention. A better idea might be
for the master worm to post its IP address, or maybe just the last
[32 - subnet mask] bits of it. Other worms could then connect to the
specified port (the port could be dynamic and also posted in the DNS
cache). This would be an interesting way for the master worm to obtain the
list of infected hosts on its network without broadcasting messages
directly to each host.  

    For the more paranoid, it might suffice to set flags which would be
understood by other worms. The presence of a given subdomain in the cache
could mean "do not infect any other hosts on my network", "remain hidden"
or "do your deed, minions!". This last concept leads us to a final point in
this section, which is the ability of the worm writer to manually trigger a
certain action of a worm. Assuming he knows the control domain used by his
worm on a certain network, he can issue a query for a subdomain, which
would prompt the worms to perform a certain action. It is also conceivable
that similar interaction could occur between separate networks with worms
on one looking up another ISP's DNS addresses and communicating across
network boundaries.

---[ 2.3 Detection

    At the moment there are no NIDS capable of detecting this method of
communication on a network.

    If the domain and subdomains are picked sensibly, each packet on its
own will seem completely benign. What could attract attention is a sudden
increase in the volume of DNS traffic, and especially NXDOMAIN replies
(although using a wildcard domain or some .in-addr.arpa queries would
result in NOERROR replies). Since all the replies we get are eventually
removed from the cache, the only way to try and reconstruct a message would
be to keep a log of all DNS requests and look for patterns. This could
be based on the number of queries for a particular domain and the intervals
between queries (the host setting a message will have to resend its queries
every TTL seconds), but would still return a lot of false positives.

    The way we described it, our scheme would also be vulnerable to
detection due to one characteristic: we query for a domain, and even after
receiving an NXDOMAIN response we query for its subdomains. This is
obviously troubling, and no sane program should behave like that. It's
possible to work around that, by querying the subdomains of a different
domain than the control domain. We lose the benefit of being sure that the
parent of the used subdomain doesn't exist, but if we picked the name
wisely, it should not be a problem.

---[ 2.4 Closing Notes

    As mentioned in Part 1, we do not need to rely on the NCACHE behavior
to implement such a scheme: it might be interesting to experiment with
wildcard domains or .in-addr.arpa lookups. Also, we're not confined to
the DNS infrastructure: almost any system which caches queries (think Web
proxies) can be used for a similar purpose.

    And while a widespread worm using co-ordination mechanisms is yet to be
written, you might just want to start logging the requests to your
company's DNS server...


---[ 3. References

[1] http://www.cs.ucl.ac.uk/staff/M.Rogers/kaminsky.html
[2] http://www.dns.net/dnsrd/rfc/index.html
[3] http://nstx.dereference.de/nstx/
[4] http://www.klake.org/~jt/dnshell
[5] http://www.schneier.com/blog/archives/2005/06/attack_trends_2.html
[6] http://blanu.net/curious_yellow.html
[7] ftp://ftp.rfc-editor.org/in-notes/rfc1034.txt
[8] ftp://ftp.rfc-editor.org/in-notes/rfc1035.txt
[9] ftp://ftp.rfc-editor.org/in-notes/rfc2308.txt
[10] http://www.digitalsec.net/

|=[ EOF ]=---------------------------------------------------------------=|