Tinder orders researcher to remove dataset of 40,000 profile pictures

Following a privacy kerfluffle, Tinder told a developer to remove a dataset of 40,000 of its users’ images that he had published in six downloadable zip files and released under a CC0: Public Domain License.

The dataset was called People of Tinder.

The developer, Stuart Colianni, who not-so-charmingly referred to the Tinder users as “hoes” in his source code, was using the images to train artificial intelligence.

The Kaggle page where he published the dataset now returns a 404. But, you can still get at the script Colianni used to scrape the data: he uploaded TinderFaceScraper to GitHub.

Before the dataset came down, Colianni said that he had created it with the use of Tinder’s API to scrape the 40,000 profile photos, evenly split between genders, from Bay Area users of the dating app.

Tinder’s API is notoriously vulnerable to being exploited. Not only has it been used to promote a movie, it’s also been abused to expose users’ locations and to auto-like all female profiles. (That last one evolved from homemade hack into an actual, full-fledged app for the devotedly indiscriminate.)

Then too, there was the guy-on-guy prank: the one where a programmer rigged the app with bait profiles, identified men who “liked” the phony female photos, and set them up to fling lust-filled come-ons at each other.

At any rate, Colianni’s Tinder face grab isn’t the first time we’ve seen developers make off with large facial image datasets without bothering to ask whether the people behind those images actually want to be involved in their research project.

Earlier mass face grabs include one from February, when we learned about a facial recognition startup called Pornstar.ID – a reverse-image lookup for identifying porn actors – that trained its neural network on upwards of 650,000 images of more than 7,000 female adult performers.

Did those performers consent to being identified and listed on the Pornstar.ID site? Did they agree to having their biometrics scanned so as to train a neural network? Is there any law that says their published images, which are presumably published online for all to see (or purchase) aren’t up for grabs for the purpose of training facial recognition deep learning algorithms?

The same questions apply to the Tinder face grab. And the answers are the same: there are indeed laws concerning face recognition.

The Electronic Privacy Information Center (EPIC) considers the strongest of them to be the Illinois Biometric Information Privacy Act, which prohibits the use of biometric recognition technologies without consent.

In fact, much of the world has banned face recognition software, EPIC points out. In one instance, under pressure from Ireland’s data protection commissioner, Facebook disabled facial recognition in Europe: recognition it was doing without user consent.

When Tinder users agree to the app’s Terms of Use, they thereby grant it a “worldwide, transferable, sub-licensable, royalty-free, right and license to host, store, use, copy, display, reproduce, adapt, edit, publish, modify and distribute” their content.

What isn’t clear is whether those terms apply here, with a third-party developer scraping Tinder data and releasing it under a public domain license.

Tinder said that it shut down Colianni for violating its terms of service. Here’s what Tinder said to TechCrunch:

We take the security and privacy of our users seriously and have tools and systems in place to uphold the integrity of our platform. It’s important to note that Tinder is free and used in more than 190 countries, and the images that we serve are profile images, which are available to anyone swiping on the app. We are always working to improve the Tinder experience and continue to implement measures against the automated use of our API, which includes steps to deter and prevent scraping.

This person has violated our terms of service (Sec. 11) and we are taking appropriate action and investigating further.

Indeed, Sec. 11 describes two relevant actions that are verboten:

You will not:

  • …use any robot, spider, site search/retrieval application, or other manual or automatic device or process to retrieve, index, “data mine”, or in any way reproduce or circumvent the navigational structure or presentation of the Service or its contents.
  • …post, use, transmit or distribute, directly or indirectly, (eg screen scrape) in any manner or media any content or information obtained from the Service other than solely in connection with your use of the Service in accordance with this Agreement.

So sure, yes, turning off Colianni’s access makes sense: he was scraping/data mining for purposes outside of Tinder’s terms of use.

My question: why has Tinder taken this long to shut off this type of activity?

I’m thinking here of Swipebuster: the app that promised to find out – for $4.99 – if your friends and/or lovers are using/cheating on you with Tinder… including letting you know when they used the app last, whether they’re searching for women or men, and their profile photo and bio.

It’s a year ago that Swipebuster was in the news. At the time, Tinder was just fine with developers lapping at the faucet of its free-flowing API. Hey, if you want to shell out the money, it’s up to you, Tinder said. After all, it’s all public information, it said at the time:

… searchable information on the [Swipebuster] website is public information that Tinder users have on their profiles. If you want to see who’s on Tinder we recommend saving your money and downloading the app for free.

What’s changed between then and now? How is using the face dataset to train facial recognition AI different from Swipebuster’s catch-the-cheaters pitch? It’s all still public information, after all.

Is access to the API now restricted to prevent apps from scraping users’ images? Or did Tinder just shut down this one researcher? What’s the thinking, here, on how Colianni’s use of Tinder users’ faces was egregious, but Swipebuster’s use was just fine?

I asked. Tinder responded by sending the same statement that it sent to TechCrunch.