An anonymous researcher just published a paper that claims to have mapped out almost the entire internet for the first, and perhaps the last, time.
I know what you’re going to say.
“Wow!”
Or, perhaps, if you’re a slightly less trusting sort, “Oh, really? How?”
The answer, apparently, is, “Using a botnet.”
The author claims to have developed a small virus that he compiled for nine different sorts of router using the software development tools from the OpenWRT project.
OpenWRT is open-source router firmware: a Linux distribution originally targeting the Linksys WRT54 SoHo router, and derived from the source code published by Linksys years ago to comply with the GPL licensing requirements.
→ The GPL, or GNU Public licence, says “you can use this code for free, but if you use it to make a product you give to someone else, you must give them this source code, together with all your modifications.” The theory is that you won’t be able to take someone else’s GPL source and hide it behind your own proprietary tweaks. Your tweaks must themselves become open source, so the next guy can do his own tweaks, and so on.
With a size below 60Kbyte, the bot could run even on modestly-powered router hardware, such as a WRT54 router or similar, and could set out to measure the world, one network at a time.
The author dubbed his botnet Carna, and describes the theory of its operation in simple, exponential, terms:
After completing the scan of roughly one hundred thousand IP addresses, we realized the number of insecure devices must be at least one hundred thousand. Starting with one device and assuming a scan speed of ten IP addresses per second, it should find the next open device within one hour. The scan rate would be doubled if we deployed a scanner to the newly found device. After doubling the scan rate in this way about 16.5 times, all unprotected devices would be found; this would take only 16.5 hours. Additionally, with one hundred thousand devices scanning at ten probes per second we would have a distributed port scanner to port scan the entire IPv4 Internet within one hour.
Very loosely put, that’s just what he did. Or says he did.
On an infected device, the virus worked something like this:
- Open a port for remote access by the central internet mapping systems.
- Reach out to scan and record details about a subset of the rest of the internet.
- Identify routers with telnet open onto the internet and a weak root password, e.g. root:root, admin:admin or either account with no password.
- Login and install the virus on the next open router in the ever-growing tree of zombies.
The central mapping servers used the newly-opened ports on infected devices to collect data from the botnet.
According to the author, this is a neater and safer way that having the infected routers call home, because the command-and-control system is not itself reachable, and thus cannot easily be abused or knocked offline.
This is different from most cybercrime bots, because PC-infecting zombies are usually behind firewalls, and thus can only work by calling outwards.
Because this botnet only infects routers that it knows it can already connect to, it can rely (for the most part) on connecting into the routers a second time to acquire its results.
There is a raft of other details described in the paper: various rate-limiting precautions in the virus to reduce the risk of interfering with infected routers; the use of “middle nodes”, or staging routers, to collect and relay results from round about; and drop-dead logic in the virus itself so that it won’t run forever.
After all, if you weren’t invited in the first place, it’s really important not to outstay the welcome you never got.
The results, at least in summary, appear to be spectacularly detailed, with a claimed 420,000 infected routers identifying and geolocating some 1,300,000,000 devices on the IPv4 part of the internet, with about one-third of those responding directly to pings.
The author has also produced a range of intriguing and colourful visualisations, even if some of them don’t say much more than that North America, Europe and Japan are better-connected than, say, Africa.
Like I said at the start, “Wow!”
What I can’t say, at least right now, is whether the results are legitimate, or just a synthetic summary of a giant hoax.
That’s because I simply don’t have the personal bandwidth to verify the data any time soon: it’s a 568GByte BitTorrent download that expands to a whopping 9TByte.
Also, it’s compressed using ZPAQ, which squashes the data to one-third the size of gzip, but requires giant CPU effort to decompress. (The author actually recommends using a distributed unpacker that runs on multiple computers on your LAN – so you just about need a miniature botnet to make sense of the giant one.)
Let’s assume that it’s all true.
If so, this is probably the first time since the earliest days of the internet that a map of this detail has been produced.
And, given the slow but steady increase in the much-harder-to-scan IPv6 internet, it may very well, as the author points out, be the last such map ever made.
And that, my friends, puts you on the horns of a dilemma.
This may be a once-in-a-lifetime opportunity.
But it seems hard to argue that this data was lawfully acquired, since it relied on a virus that infected and replicated without permission.
Indeed, the nightmare legalistic phrase “The defendant faces 420,000 charges under the Computer Misuse Act for Unauthorised Access and Unauthorised Modification” leaps to mind.
Which raises the question: no matter how useful or interesting the data, is it ethical to use it?
I’m not sure.
I gave you the link above, and I’ve used one of the world maps to give you a feel for the scale of this stuff.
But my gut feeling about the 9TByte of downloadable data is, “Don’t use it. Stay away. Don’t give your implicit approval for data collection done this way.”
What do you think? Vote in our poll…
I find this to be the most interesting point of his document, "…while everybody is talking about high class exploits and cyberwar, four simple stupid default telnet passwords can give you access to hundreds of thousands of consumer as well as tens of thousands of industrial devices all over the world."
Now *that* aspect of the commentary I agree with and approve of 🙂
As a friend pointed out earlier, mobile internet is far more prevalent in certain areas (e.g Africa) than fixed line and this data would not be included in the census due to mobile users being behind their telco's NAT.
That and IPv6 aside, I think this is about as accurate as you'd get – unless the big G decided to make their data public – everyone uses Google, right? 😉
On a mildly related point, the prevalence of wireless internet in the developing world is also an indication of why this sort of "research", conducted as it is at the router owner's expense, isn't quite as harmless as you might think…
If you've seen the price of internet access in some parts of the world, you might think twice about "borrowing" other people's data without authorisation, even if it's only for a 60KByte upload/download and modest number of ICMP packets.
So… he named it "Carna"? Seems to me he should have known that Carnal knowledge without consent would constitute a crime.
How about a Hmmmm option for those of us who can't decide if this was more for the good or more for the bad
Sorry. No equivocation 🙂
You know… I was thinking, it really doesn’t matter whether it’s ethical or not (not anymore), let’s say the answer is no to that question, individuals and or governments will justify or condone the use of this information as long as it suits their own needs, good intentions or not, it’s just the world we live in, at least that’s how I see it.
Great article!
As true as that may be, it doesn’t change the actual ethecality (is that a word?) of using the information.
Sounds like the Morris Worm all over again?
http://en.wikipedia.org/wiki/Morris_worm
I believe that the courts would say he did no wrong. There was no security and he simply walked in through an open front door. Add to that that there were no damages – monetary or otherwise – and all traces of the activity have been removed from the device and indeed, rebooting the device renders it 'whole'. We all have seen the security messages when we log on to our computers at work – that message relieves our workplaces of liability because they actually told the bad guys to stay out. The opposite is true in this case. The user was invited in as no security had been set – not even a change of password.
I hear you, even if I don't agree entirely.
There *was* security, albeit that it was very poor, and there *was* cost to the user (see my comment to @nedge2K above), albeit that it wasn't much to people in the richer parts of the world.
Anyway, if I were the anonymous researcher, I wouldn't want to bet that every court in every jurisdication would agree with you…
Trouble is, the ruling of the court would have to be based upon the local law applicable to the country that the compromised device was within.
e.g. see the UK “Computer Misuse Act 1990”:
http://www.legislation.gov.uk/ukpga/1990/18/section/1
I’d say this is a clear (in eyes of the law) breach of CMA 1990. But, I’m no legal bod.
I’m guessing obvious defence would be to state that a router is not a “computer”. But, doubt that would work.
I’m guessing that other countries have far more severe laws & penalties. And, the “researcher” will be in breach of all those, too.
I’m guess the FBI will already be on the case here.
The biggest news here is: this is telling malware/botnet writers to start targetting routers/devices. They won’t be protected. And once “infected”, won’t be detected. They’ll get away with more for longer.
Having been involved in the writing of CMA 1990 I can say emphatically that the intention was that this type of activity would be an offence.
In fact I think that quite a lot of what goes on 'automatically' in our computers is in breach of CMA because the computer owner has never knowingly authorised it. The companies who use our computers for various doubtful purposes get away with it because they put some deliberately unclear line, that covers a multitude of sins, into the T&Cs and we users merrily click the accept button because a) we want to get on with whatever we are doing and have to use the product, b) we just aren't interested in all the waffle, and c) we don't understand it when we do read it. Oh, and d) 7 million in the UK were educated there and are illiterate as a result.
…so, if there were notices that expressly forbid unwanted access even though the passwords were default or nonexistent… would it had mattered, do you really think those routers would have been skipped?
Was it a he or she?
Self-evidently interesting, but not noteworthy until verified and validated. Would a statistical analysus of the 9TB of data help decide if it's a hoax or not?
If it is true, one immediate side effect is that someone has just proved (1) you could stop a significant number of internet users directly by logging into their routers with the defaults, (2) you could use the same technique as the author to take control of a massive distributed albeit low powered platform, and tailor for your own use.
(part 1 of 2)
—
my comments are my own and in no way should be interpretated as representing the views of any of my employers, past, present or future.
As regards the rights and wrongs of what he did, it is clearly illegal, and had some (unmeasured) impact on the owners of the devices. He (and we) also benefit from the fact it didn't go spectacularly wrong in.
I'd temper my judgement by reflecting on the illegal (and arguably) immoral practices of some in the medical profession in the 18th and 19th centuries, of paying for cadavers, to dissect for research. Without that effort, medical science would be nowhere near as advanced as it is. What is not so clear is if this hack (in all senses) provides even a tiny percentage of the benefit for us compared to the medical scenario.
(Part 2 of 2)
—
My comments are my own, and should not be interpreted as reflecting my employer's views, past present or future.
I agree that what he has done is wrong but don't throw the data away just because of how it was acquired. There are many instances in science where data originated from methods using less than ideal standards but the data is still valuable.
My first thoughts are – exactly what is the benefit of analysing the 9TB of nodes out there? What could you do with it? Other than take a cautionary message about securing your own router?
My second thoughts are – presumably, the burning up of the enormous cumulative bandwidth, by the hundreds of thousands of iOS / Android apps out there that send and receive data without the users' knowledge are less unethical, but in many cases, only marginally so.
Third thoughts are that with IPv6, whilst you could only really collect the routing table without any hacking, in many cases, it would actually give you quite a good picture of the IPv6 internet without having to invade any routers.
There is not a government on the planet that would think twice about using the "acquired" data if it served their purposes.
Verify his data by checking his IP map against your own IP addresses. I can attest that my own /29 is correctly represented in his visualization.
My other question is in regards of the usage from this data. If for example an ISP used this information to then go and pro-actively contact their customer base after learning which devices may be compromisable and then doing analysis based on the serial numbers / mac addresses of the connected CPE across their subscriber base wouldn't that be a good thing.
While I agree with you that the research used highly illegal means to gather the data there is a rather uncomfortable situation where now because of the availability of the research a large number of devices out there are openly available to be exploited depending on if a unscrupulous person decides to leverage the gathered information for their own even more illegal purposes.
Wouldn't ISPs have a duty of care to contact their customer base and inform them that they could have the potential to be compromised? And to do that would require leveraging said data available via Torrent?
This is a good exam question for the CISSP or other computer security exams. What statute did this guy violate? Or use it for an interview question to check a candidate's ethics. Contrast this to Google hacking or Shodan where it looks dodgy but doesn't actually break laws. This person better stay anonymous, they're in a heap of trouble.
Reading the article and several of the posts here it seems we are not making any differentiation between what is "ethical" and what is "legal". The two are not the same thing… while it probably was not technically legal, no harm was done and it produced something with educational value. The information was shared, the techniques and methods were shared… nothing was hidden (assuming everything is as it has been portrayed). Therefore, I can't see anything unethical about this. If people don't like that their routers were used this way they should start by changing their passwords before they start crying foul.
I think that there are 2 issues here;
the first is that the results of this survey should provide a wake-up call to internet users who are remotely interested in their on-line security. The fact that the information gathered was done in a completely harmless way, (if we are to believe the tool’s creator), makes the exercise a plus in my opinion.
The second issue is that of violation of privacy. Someone previously mentioned that “the door was left open and someone walked in”.
This line of thinking rather bothers me, since I suspect that such a statement reflects our post – internet view of privacy which is very different indeed of tat of say 40 years ago.
Here’s an analogy:
What the creator of this tool did is much like walking down every street in your city, entering every unlocked front door and re-arranging the books in the bookcase. No real harm done, but a violation nonetheless. The fact that you left the door open does not mean that it’s ok to enter it uninvited; that’s called trespassing in most societies.
In pre – internet days, this would have been considered unacceptable, but we now view such activity as sort of OK.
Objectively, the research gets a plus from me. It does however raise a small red flag about our view of privacy on an ethical and moral level.
And, now having been published, what makes us think that the NSA hasn't already done this or is not continuing to do this, particularly given that the directions have been published, that the NSA has the tools (storage and CPU), and the will?