How mobile apps leak user data that’s supposedly off-limits


How free are “free” mobile apps?

Not at all, of course, just like their “free” online brethren.

Mobile apps and online services such as Facebook, Google et al. might not cost anything, but they come at the cost of having our privacy picked over by voracious ad networks.

Researchers at the School for Computer Science at the Georgia Institute of Technology recently delved into just how much data users are giving away to pay for free mobile apps.

Their findings: a lot more than what you’d imagine by reading, say, Google’s privacy policy.

As described in a recently released paper titled The Price of Free: Privacy Leakage in Personalized Mobile In-App Ads, the researchers found that in-app advertising is leaking potentially sensitive personal information on millions of mobile phone users, including how much money we make, whether or not we’ve got kids, and what our political leanings are.

We have a permeable membrane between ad networks and mobile app developers to thank for all this dribbling.

How that leaky membrane works

From Georgia Tech’s press release:

  • Mobile app developers choose to accept in-app ads inside their app
  • Ad networks pay a fee to app developers in order to show ads and monitor user activity: collecting app lists, device models, geolocations, etc. This aggregate information is made available to help advertisers choose where to place ads
  • Advertisers instruct an ad network to show their ads based on topic targeting (such as “Autos & Vehicles”), interest targeting (such as user usage patterns and previous click throughs), and demographic targeting (such as estimated age range)
  • The ad network displays ads to appropriate mobile app users and receives payment from advertisers for successful views or click throughs by the recipient of the ad
  • In-app ads are displayed unencrypted as part of the app’s GUI. Therefore, mobile app developers can access the targeted ad content delivered to its own app users and then reverse-engineer that data to construct a profile of their app customer

To test what’s being leaked, researchers created a custom-built Android app that they installed on more than 200 participants’ phones.

Then, they reviewed the accuracy of personalized ads served to test subjects from the Google mobile ad network, AdMob, based on their personal interests and demographic profiles.

The researchers note that as far as they know, this is the first study to suggest that demographics play a key role in determining what ads we’re fed, as opposed to just our interests.

They found that more than 57% of ad impressions for 41% of the users match users’ interests, but even more match their demographics: more than 73% of ad impressions for 92% of users are correlated with user’s demographic information.

They also found that a mobile app developer could learn these things about a user from the ads shoveled onto their phone:

  • Gender, with 75% accuracy
  • Parental status, with 66% accuracy
  • Age group, with 54% accuracy
  • Income, political affiliation, and marital status, with higher accuracy than random guesses

Note that Google deems some demographic identifiers – including race, religion, sexual orientation or health – to be so sensitive that it explicitly rules out using them for ad shoveling.

From Google’s privacy policy, emphasis added:

We use information collected from cookies and other technologies, like pixel tags, to improve your user experience and the overall quality of our services. One of the products we use to do this on our own services is Google Analytics. For example, by saving your language preferences, we’ll be able to have our services appear in the language you prefer.

When showing you tailored ads, we will not associate an identifier from cookies or similar technologies with sensitive categories, such as those based on race, religion, sexual orientation or health.

In fact, in-app advertising opens up a new channel for leaking personal information – age, gender, whether they have kids, income, political affiliation, marital status – to anybody who can access the ads, in spite of none of that demographic information supposedly being used for personalization.

From the paper:

This finding shows that in in-app advertisement settings, a guarantee from Google is no longer enough for protecting the user’s privacy, since user information that Google uses for personalization can be inadvertently leaked to any third party that host[s] Google ads, and Google has no control over how such leaked information an be used to derive more sensitive information about the user.

The researchers found that the root cause of the privacy leakage is the lack of isolation between the ads and mobile apps. Adopting HTTPS wouldn’t do anything to protect the ad traffic.

They point to previous work that highlighted the need to isolate ad libraries largely from the perspective of separating permissions of ad-related code from the code of the hosting app.

But in addition, their work shows there’s also a need to prevent the hosting app from reading the ad library’s data when that data is derived from the ad-network’s private information, they concluded.

They suggest that ad providers should build defense mechanisms into their products to protect users’ privacy, such as noise or randomness added to personalized results, similar to what’s been suggested for protecting privacy around people’s search histories.

Ad networks could also provide coarser grained targeting options for advertisers, the researchers suggested.

For example, rather than target 26-year-old users, ad networks might instead provide a range to target: say, 25 to 34. Google AdMob is already offering coarser ad targeting for age groups.

How likely is it that ad networks would smudge the precision of their ad personalization and thereby potentially threaten their ad revenues, just to protect our data privacy?

Good question! But hey, the researchers said, it’s worth throwing onto the table:

We will leave it as an open problem to identify a strategy that can avoid such tradeoff and still work in the current ad-hosting environment (where there is no isolation between the logic/data of the ad-library and the main app).

Image of Mobile data courtesy of