Researcher uses botnet to map internet – vital public service, or cybercriminal dodginess? [POLL]

An anonymous researcher just published a paper that claims to have mapped out almost the entire internet for the first, and perhaps the last, time.

I know what you’re going to say.

“Wow!”

Or, perhaps, if you’re a slightly less trusting sort, “Oh, really? How?”

The answer, apparently, is, “Using a botnet.”

The author claims to have developed a small virus that he compiled for nine different sorts of router using the software development tools from the OpenWRT project.

OpenWRT is open-source router firmware: a Linux distribution originally targeting the Linksys WRT54 SoHo router, and derived from the source code published by Linksys years ago to comply with the GPL licensing requirements.

→ The GPL, or GNU Public licence, says “you can use this code for free, but if you use it to make a product you give to someone else, you must give them this source code, together with all your modifications.” The theory is that you won’t be able to take someone else’s GPL source and hide it behind your own proprietary tweaks. Your tweaks must themselves become open source, so the next guy can do his own tweaks, and so on.

With a size below 60Kbyte, the bot could run even on modestly-powered router hardware, such as a WRT54 router or similar, and could set out to measure the world, one network at a time.

The author dubbed his botnet Carna, and describes the theory of its operation in simple, exponential, terms:

After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand. Starting with one device and assuming a scan speed of ten IP addresses per second, it should find the next open device within one hour. The scan rate would be doubled if we deployed a scanner to the newly found device. After doubling the scan rate in this way about 16.5 times, all unprotected devices would be found; this would take only 16.5 hours. Additionally, with one hundred thousand devices scanning at ten probes per second we would have a distributed port scanner to port scan the entire IPv4 Internet within one hour.

Very loosely put, that’s just what he did. Or says he did.

On an infected device, the virus worked something like this:

  • Open a port for remote access by the central internet mapping systems.
  • Reach out to scan and record details about a subset of the rest of the internet.
  • Identify routers with telnet open onto the internet and a weak root password, e.g. root:root, admin:admin or either account with no password.
  • Login and install the virus on the next open router in the ever-growing tree of zombies.

The central mapping servers used the newly-opened ports on infected devices to collect data from the botnet.

According to the author, this is a neater and safer way that having the infected routers call home, because the command-and-control system is not itself reachable, and thus cannot easily be abused or knocked offline.

This is different from most cybercrime bots, because PC-infecting zombies are usually behind firewalls, and thus can only work by calling outwards.

Because this botnet only infects routers that it knows it can already connect to, it can rely (for the most part) on connecting into the routers a second time to acquire its results.

There is a raft of other details described in the paper: various rate-limiting precautions in the virus to reduce the risk of interfering with infected routers; the use of “middle nodes”, or staging routers, to collect and relay results from round about; and drop-dead logic in the virus itself so that it won’t run forever.

After all, if you weren’t invited in the first place, it’s really important not to outstay the welcome you never got.

The results, at least in summary, appear to be spectacularly detailed, with a claimed 420,000 infected routers identifying and geolocating some 1,300,000,000 devices on the IPv4 part of the internet, with about one-third of those responding directly to pings.

The author has also produced a range of intriguing and colourful visualisations, even if some of them don’t say much more than that North America, Europe and Japan are better-connected than, say, Africa.

Like I said at the start, “Wow!”

What I can’t say, at least right now, is whether the results are legitimate, or just a synthetic summary of a giant hoax.

That’s because I simply don’t have the personal bandwidth to verify the data any time soon: it’s a 568GByte BitTorrent download that expands to a whopping 9TByte.

Also, it’s compressed using ZPAQ, which squashes the data to one-third the size of gzip, but requires giant CPU effort to decompress. (The author actually recommends using a distributed unpacker that runs on multiple computers on your LAN – so you just about need a miniature botnet to make sense of the giant one.)

Let’s assume that it’s all true.

If so, this is probably the first time since the earliest days of the internet that a map of this detail has been produced.

And, given the slow but steady increase in the much-harder-to-scan IPv6 internet, it may very well, as the author points out, be the last such map ever made.

And that, my friends, puts you on the horns of a dilemma.

This may be a once-in-a-lifetime opportunity.

But it seems hard to argue that this data was lawfully acquired, since it relied on a virus that infected and replicated without permission.

Indeed, the nightmare legalistic phrase “The defendant faces 420,000 charges under the Computer Misuse Act for Unauthorised Access and Unauthorised Modification” leaps to mind.

Which raises the question: no matter how useful or interesting the data, is it ethical to use it?

I’m not sure.

I gave you the link above, and I’ve used one of the world maps to give you a feel for the scale of this stuff.

But my gut feeling about the 9TByte of downloadable data is, “Don’t use it. Stay away. Don’t give your implicit approval for data collection done this way.”

What do you think? Vote in our poll…