Why are redditors ripping images from Instagram? Because they can

Poor Instagram users. If it’s not one thing, it’s another.

Recently, it was a leaky API that led to 6m high-profile accounts getting hacked (and their details subsequently put up for sale at $10 a pop) – including the likes of Emma Watson, Taylor Swift, Selena Gomez and Harry Styles.

Before that, Instagram supplied us with yet another example of why you should be careful with adding friends on the platform (or any social media platform, for that matter)… And why you should be careful of those who you consider your “friends”…

… Namely, the creeps posing as friends who can be found on the creepshot-sharing site Anon-IB, where users have posted images they say they took from Instagram feeds of “a friend”.

And now, we have a new breed of data mosquito sucking off Instagram’s neck: redditors who are out to archive – in other words, to steal – every single Instagram image, be it posted publicly or stored in supposedly locked accounts.

Why? Well, in a nutshell, because they can:

You can see the appeal to those who lack qualms about taking people’s content but who love to hoard data. Consider these Instagram statistics:

  • As of January 23 2017, there were 95m images being uploaded per day.
  • More than 40bn photos had been uploaded to Instagram as of that date.
  • The people uploading those photos are the preferred prey of image stealers: they’re young and quite often female. 31% of American women and 24% of men use Instagram.
  • 59% of internet users between the ages of 18 and 29 use Instagram, as do 33% of internet users between the ages of 30 and 49.

The person who kicked off the project to rip every Instagram photo is -Archivist – one of the moderators of the r/DataHoarder subreddit. He told Motherboard that his real name is John, that he’s in his late 20s, and that when he’s not archiving Instagram, he’s “archiving something else”.

As in, for example, porn videos. Turns out he was one of the redditors who came up with a plan to test the ceiling of Amazon’s cloud storage plan, which was killed off in June. (The redditor beaston02 hit nearly 2 petabytes of porn, or about 293 viewing years’ worth of smut, by the time Amazon pulled the plug.)

John first posted his idea to create a distributed Instagram archive on January 5. At that point, by himself, he had already ripped the posts from some 3,400 accounts, or about 2.2m files, which represented about 633 GB of information.

By now, after other redditors joined in, the archive has swelled to around 580TB of Instagram posts.

He did it with an open source program called RipMe that downloads albums in bulk. It pulls in images and videos from public Instagram accounts. It was a sluggish way to do it, though, John told Motherboard:

You can go to anybody’s profile and list their followers, but this list is loaded around 20 accounts at a time. So manual collection of usernames required me to scroll for hours. I initially overcame this by literally stuffing a bit of cardboard into my ‘page down’ key and walking away from my laptop.

We’ve seen others, including Danish researchers who amassed personal data on 70,000 OKCupid users, use scrapers – automated tools – to download user data from websites. We’ve also seen sketchy third-party apps going after Snapchat user data via its public API, and we’ve seen Tinder’s API used by researchers to grab 40,000 profile pictures.

But here’s the thing with relying on APIs to pull in people’s data without their permission: that spigot can be turned off, leaving you high and dry.

But not the Instagram archival project. As John emphasized in an update to his initial post, the project doesn’t rely on Instagram’s API. Instead, it relies on John and his initial dataset, plus the current 30 to 40 people now involved (along with their valuable storage space), plus – and here’s the cherry on top – the addition of a few dozen lines of code that enable collection of photos from around 2m accounts every 24 hours.

The “vast majority” of images are from public accounts, Motherboard reports. But there are photos from private accounts, as well: John chiseled them out of their accounts by creating an Instagram bot programmed to seek out and follow private accounts in the hope that they’d follow the bot back, after which the private contents could be slurped up and added to the archive.

John said the bot has had a 70% success rate at getting followed.

Which leads us back to the injunction cited above: to protect your Instagram account from getting ransacked, be careful about who you friend. It’s all too easy to friend a bot that wants to raid your contents and suck up to your friends so it can expand its reach.

There’s more you can do, too: after the Instagram API sprung a leak and hackers stole all those high-profile user derails, we passed along five additional ways to keep your Instagram profile safe.