Last week, Fidelis Cybersecurity Threat Research reported multiple attacks on internet-facing Hadoop Distributed File System (HDFS) cluster installations worldwide. These attacks followed spikes in traffic to HDFS-linked Port 50070 from one IP address purportedly originating in China: 22.214.171.124.
The Hadoop attacks followed ongoing attacks on MongoDB, ElasticSearch, and Apache CouchDB. In some cases, criminals have been known to clone and wipe databases, claiming to hold the originals for ransom. In other attacks, they have simply deleted databases without demanding payment.
For the Hadoop attackers, simple data destruction seems the primary means of attack thus far. As Computer Business Review reports:
In one incident, Fidelis observed an attacker erasing most of the directories and creating a single directory called “NODATA4U_SECUREYOURSHIT”. There was no attempt to claim a ransom or any other communication — the data was simply deleted and the directory name was left as a calling card.
This time last year, Swiss infosecurity firm Binary Edge claimed to have seen more than 1,000 Hadoop attacks. If this very high estimate proves out, it likely represents a significant percentage of internet-connected Hadoop installs worldwide: a total estimated by the GDI Foundation at 5,300 and by Fidelis at between 8,000 and 10,000.
Why did Hadoop become vulnerable? It’s easy to deploy it with poor security, and tightening Hadoop security requires effort. As the GDI Foundation’s Victor Gevers pointed out recently, Hadoop’s default web interface settings leave things wide open:
The default installation for HDFS Admin binds to the IP address 0.0.0.0 and allows any unauthenticated user to perform super user functions to a Hadoop cluster [via a web browser]… including destroying data nodes, data volumes, or snapshots with TBs of data in seconds.
Folks experimenting with Hadoop may sometimes get careless because it’s easy to leave inadequate security as they found it. As Teradata’s Ben Davis wrote last summer:
Sounds simple, but if you are security conscious then you’re probably going to have your cluster without internet access to prevent breaches… During deployment, Hadoop loves being connected to the internet because you need a host of packages from JDKs, Python and JDBC drivers during the installation. [Without] internet access, then be prepared to double the effort to deploy.
Of course, as The Register notes, disconnecting from the internet isn’t an option for organizations that run Hadoop through Platform-as-a-Service hosts. But even if you aren’t running remotely, Davis observes, “it is easier to deploy Hadoop in a fairly low security configuration… because there are a range of ports that Hadoop talks on and having an incorrectly configured firewall can cause you problems”.
Bottom line: once you’re running, you need to work at configuring firewalls, users, groups, Kerberos and SSL. It’s easy to imagine that not everyone does so – especially if they’re running a test to see what Hadoop can do. Unfortunately, not everyone messing around with experimental Hadoop clusters is using equally innocuous data. And there’s no telling how many real production Hadoop instances have suboptimal security.
So, what can you do about it? If you’re running a Hadoop instance that’s exposed to the internet without (at least) strong authentication, disconnect it until you secure it. These generic directions show how to configure Kerberos authentication for Hadoop in secure mode while this white paper offers some more systematic ideas for securing Hadoop. Meanwhile, carefully consider your evolving options for backup and recovery. There are helpful discussions here and here.
And if you’re running another internet-connected database that hasn’t been attacked yet, be proactive. As PC World’s Lucian Constantin writes:
Destructive attacks against online database storage systems are not likely to stop soon because there are other technologies that have not yet been targeted and that might be similarly [unprotected]…
You’ve been warned.