Unmasking Tor users with DNS


Researchers at the KTH Royal Institute of Technology, Stockholm, and Princeton University in the USA have unveiled a new way to attack Tor and deanonymise its users.

The attack, dubbed DefecTor by the researchers’ in their recently published paper The Effect of DNS on Tor’s Anonymity, uses the DNS lookups that accompany our browsing, emailing and chatting to create a new spin on Tor’s most well established weakness; correlation attacks.

Tor works by routing your traffic through a randomly chosen ‘circuit’ of three nodes chosen from about 7000 computers that are offered up for the purpose by volunteers around the world.

The first node in the circuit is drawn from a pool of about 2500 out of those 7000, known as ‘entry guards’ and chosen because they have a track record of high uptime and availability. The entry guard knows where your traffic comes from, but not who you are or what’s in it, thanks to Tor’s use of encryption.

The last computer in the circuit is drawn from a pool of 1000 out of 7000, known as ‘exit nodes’. The exit node knows where your traffic is going when it leaves the Tor network, but not who you are or where your traffic started out from.

The middle computer in the circuit, picked from all current Tor nodes, is there to to keep the entry guard and the exit node apart so that they can’t easily collude to share their ‘from here’ and ‘to there’ data to subvert the system.

Correlation attacks observe the traffic entering and leaving Tor, and then attempt to pair up incoming and outgoing streams, despite the ‘divide the traffic’ node in the middle of every Tor circuit.

Because the traffic that passes from the client, through the circuit and all the way to the exit node is encrypted, attackers can’t just read it. Instead they have to use low level details like packet lengths and directions to look for known patterns that reveal what sites a user is visiting – a technique called fingerprinting.

An attacker doesn’t have to watch all of of Tor’s traffic, or even most of it, to start deanonymising users but they do need time and access to vantage points from which to observe the incoming and outgoing activity.

Correlation attacks need two vantage points; one to observe incoming traffic and one to observe outgoing traffic. Incoming traffic can be monitored from compromised networks, ISPs or entry guards and outgoing traffic is typically monitored from exit nodes.

What the new research shows is that DNS requests can be used as an entirely new vantage point from which to successfully observe outgoing traffic and conduct correlation attacks.

A new way to observe Tor traffic

Much of what we do online is preceded by DNS lookups. If you try to view a web page on nakedsecurity.sophos.com, the first thing your computer has to do is discover where to find the site on the vastness of the internet. It does this by converting the name nakedsecurity.sophos.com into an IP address by looking it up in the global Domain Name System.

Knowing what somebody looked up via DNS allows you to make a good guess about the websites they’re visiting.

What the researchers did was to look for the fingerprints of known websites in the encrypted traffic flowing into entry guards and then match it against DNS requests in unencrypted traffic flowing out of exit nodes:

We combine a conventional website fingerprinting attack operating on traffic from ingress sniffing with DNS traffic observed by egress sniffing, creating DefecTor attacks. Our attacks correlate  the web sites observed by the website fingerprinting attack in ingress traffic with the web sites identified from DNS traffic.

That might sound simple enough but it is, of course, devilishly complicated.

Amongst other factors that can confuse the picture are: each observed exit node is handling an unknown share of the nearly 300,000 website visits that flow through Tor every ten minutes; individual circuits only last about ten minutes, meaning that users’ exit nodes change regularly; exit nodes cache DNS requests so visits to websites aren’t always preceded by DNS lookup; some websites are far more popular than others.

Web pages can also reference multiple domains (according to the research a visit to one site triggers a little over 10 DNS lookups on average) so only a fraction of the DNS lookups observed actually reflect the sites users are visiting – a Naked Security page that includes an embedded YouTube video will cause your browser to lookup youtube.com, for example.

The researchers sorted the wheat from the chaff by looking up all the domains referenced by the one million most popular websites:

We collected 2,540,941 unique domain names from a total of 60,828,453 DNS requests. The dataset contains 2,260,534 domains that are unique to a particular website, i.e., are not embedded on any other top million site; we call these domains unique domains . Unique domains are particularly interesting because they reveal to the adversary what sites among the top million the user has visited.

Despite all the obstacles, the team claim they were able to use DNS lookups as a new and effective vantage point for correlation attacks, one that works particularly well for unpopular sites.

…existing website fingerprinting attacks can be augmented with observed DNS requests by an [Autonomous System]-level adversary to yield perfectly precise DefecTor attacks for unpopular websites

New places to lurk

Domain names are hierarchical and the lookups for different parts of a domain (the parts separated by dots) are despatched to different DNS servers.

This expands an adversary’s choice of places from which to observe traffic significantly because they can position themselves anywhere on the path between an exit relay and any of the name servers it has to communicate with to resolve a domain.

… for the Alexa top 1,000 most popular websites, 60% of the [Autonomous Systems] that are on the paths between the exit relay and the DNS servers required to resolve the sites’ domain names are not on the path between the exit relay and the website.

Exit nodes can either make DNS lookups themselves or outsource them to a third party resolver. Many choose to use their ISP but about a third of all the observed DNS requests coming from the Tor network went to a single entity; Google’s popular resolver – a situation the researchers describe as “alarming”.

How an exit node is configured also makes a considerable difference to the efficacy of the DefectTor attack; exit nodes that use their ISP for DNS lookups offered the smallest attack surface whilst those that relied on their own DNS resolvers were compromised much sooner.

Where you are matters

Perhaps the most surprising observation from the research is that where you are makes a big difference to your ability to stand up to correlation attacks using DNS.

The experiment was run from the five countries where Tor is most popular; the USA, UK, Germany, France and Russia, and found that traffic from the USA and UK was far harder to crack.

the median time until compromise differs by more than 10 days between UK, and RU or FR. In general, UK and US users are doing better than users in RU, FR, and DE for these two setups. We conclude that the location of Tor clients matters and should be considered in future traffic correlation studies.

What you can do

This new attack shouldn’t send anyone running for the hills – if your adversaries aren’t already in a position to conduct correlation attacks this probably won’t help them much.

In the short term, the authors of the paper would like to see the Tor project fix a bug that causes Tor to cache DNS entries for 60 seconds regardless of the DNS entry’s TTL (Time To Live).

In the longer term they’re also calling for Tor to implement DNS lookups over TLS (which would encrypt traffic between exit nodes and DNS resolvers), and suggest that defenses against website fingerprinting attacks in general should be “an important long-term goal.’

They also offer the following advice for exit node operators:

… exit relay operators should avoid public resolvers such as Google and OpenDNS. Instead, they should either use the resolvers provided by their ISP, or run their own, particularly if the operator’s ISP already hosts  many  other  exit  relays.  Local  resolvers  can  further  be optimized to minimize information leakage, by (for example) enabling QNAME minimization

Site operators worried about their users’ anonymity can bypass the DNS system entirely, and stay within the Tor network, by running their site as a hidden service.