Cloudflare mistakes own DNS for DDoS attack

When is a DDoS attack not a DDoS attack?

In the case of Cloudflare’s much-vaunted and recently-launched DNS service, the answer is when the company diligently starts blocking a DDoS event which turns out to have been caused by something much closer to home.

Users pointing their DNS resolution at (or at router level on 31 May would have noticed a 17-minute disruption to DNS resolution for all network devices, starting at 17:58 UTC.

Users doing the same from a Windows, Linux or Mac computer would have noticed the same effect but only on that device.

Anyone who had the presence of mind to switch to a different DNS service – the Global Cyber Alliance’s or their ISP’s default, say – would have noticed that website domains were suddenly resolving again. This would have been a good clue that something wasn’t quite right.

A DNS resolver disappearing for that long might indicate some kind of DDoS attack which, given that Cloudflare offers tier-one DDoS mitigation through something called Gatebot, would have to have been pretty remarkable to make any headway.

Cloudflare has now posted a blog in which it admitted it suffered an unusual and rare type of DDoS attack – an imaginary one.

Explained simply, Cloudflare’s Gatebot suddenly started interpreting traffic to (that is, sent to and from its users) as a DDoS attack on its infrastructure.

Whoops! It sounds bizarre at first but, as the company explains, Gatebot normally queries a hard-coded list of IP address ranges to check whether traffic is emanating from Cloudflare or is external.

On 31 May, Gatebot was pointed at a new Provision API, an innovation intended to reduce the overheads and risks of the old system’s manual updating process.

Unfortunately, the range and used by its service required an exception to be added:

As you might be able to guess by now, we didn’t implement this manual exception while we were doing the integration work. Remember, the whole idea of the fix was to remove the hardcoded gotchas!

As a result, Gatebot saw the DNS queries as an attack and did the job it was built for.

While Gatebot, the DDoS mitigation system, has great power, we failed to test the changes thoroughly. We are using today’s incident to improve our internal systems.

This is not the first incident to affect since it launched on 1 April. As well as the occasional BGP leak (a type of rerouting which can be malicious but usually isn’t), the service was inadvertently blocked by home gateways supplied by AT&T.

But why do DNS services such as matter anyway?

The traditional reason was performance, with users dissatisfied by the speed at which their ISP’s DNS server would resolve web domain names (e.g. to their underlying IP addresses.

Cloudflare’s service was intended to offer a second benefit – privacy. Although very much a work in progress (DNS queries are not encrypted and are collected in the UK and US by ISPs for a variety of reasons), long term its existence lays the foundation for new encrypted DNS standards – principally DNS-over-HTTPS and DNS-over-TLS – to build on.

But alternative DNS resolvers always stand and fall on their availability and reliability. There is no point in offering faster and more private DNS if it’s not there when users need it. Which is why Cloudflare has earnestly promised:

The next time we mitigate traffic, we will make sure there is a legitimate attack hitting us.