Massive new study lifts the lid on top websites’ tracking secrets

track

So, just how tracked are you? Plenty, according to the largest, most detailed measurement of online tracking ever performed: Princeton University’s automated review of the world’s top 1,000,000 sites, as listed by Alexa.

But you probably knew there’s a whole lotta trackin’ goin’ on. What’s interesting (and sometimes surprising) are the details. Princeton’s Steven Englehardt and Arvind Narayanan have captured the clearest picture of third-party web tracking that we’ve ever seen.

To begin, huge numbers of folks are trying to track you: 81,000+ third-party trackers appeared on at least two of the top million sites.

However, only 123 trackers showed up on at least 1% of those sites:

The number of third parties that a regular user will encounter on a daily basis is relatively small. [Moreover], all of the top 5 third parties, as well as 12 of the top 20, are Google-owned… Google, Facebook, and Twitter are the only third-party entities present on more than 10% of sites.

Englehardt and Narayanan find “a trend towards economic consolidation” – fewer but larger third-party trackers. In their opinion, that’s actually good news for privacy advocates, as these “are large enough entities that their behavior can be regulated by public-relations pressure and the possibility of legal or enforcement actions.”

Any evidence for that optimism? Maybe you remember our coverage of the uproar surrounding the use of Adobe Flash Local Shared Objects to respawn HTTP cookies you chose to delete. According to Englehardt and Narayanan, this controversy has led many large third-party trackers to abandon the practice.

So… according to the Princeton review, who tracks most? That’ll be news, arts, and sports sites, which typically provide content for free and “lack an external funding source, [and] are pressured to monetize page views with significantly more advertising.”

And who tracks least? “Mostly sites which belong to government organizations, universities, and non-profit entities… websites [that] may be able to forgo advertising and tracking due to the presence of funding sources external to the web.” Oh, and adult sites, too.

Next, Englehardt and Narayanan turned to fingerprinting: techniques for individually identifying anonymous site visitors based on the unique characteristics of their hardware and software. (Check out our detailed primer on fingerprinting here.) The researchers wanted to know: Is it really being used in the wild? How widely? Which techniques?

They began with HTML Canvas fingerprinting, reflecting subtle differences in the way browsers and devices render HTML5 Canvas-based images. Canvas fingerprinting showed up on 14,371 sites – far more than a similar measurement in 2014.

As with Flash cookie respawning:

…the most prominent trackers have by-and-large stopped using it, [following a public backlash].

However, far more smaller third-party trackers are using Canvas fingerprinting:

…obscure trackers… less concerned about public perception.

You may be glad to hear that the applications for Canvas fingerprinting have shifted too: less behavioral tracking, more fraud detection.

Englehardt and Narayanan also performed the first large-scale search for several additional, provocative forms of device fingerprinting:

  • Previously unknown AudioContext fingerprinting, exploiting differences in how browsers process audio (found on 67 sites)
  • WebRTC Local IP Discovery, in which browsers routinely send network addresses to web applications, including Ethernet and Wi-Fi addresses, as well as addresses from the public side of NAT connections (found on 715 sites)
  • Fingerprinting of device font lists (found on 3,250 sites)
  • Battery fingerprinting, based on the surprising discovery that your browser will report unique information associated with your device’s battery status – found in two third-party scripts

Englehardt and Narayanan say privacy tools like Ghostery do a nice job of protecting against standard tracking scripts from widely-used third-party trackers. However, they sometimes miss more obscure scripts using these emerging, exotic techniques.

To capture their massive datasets, the researchers built a complete web privacy measurement framework, OpenWPM. Unlike most earlier privacy testing tools, it’s built to operate at any scale, up to millions of sites.

Since they’ve open-sourced OpenWPM, anyone can use it. That includes academics: it’s already been part of seven published studies. It also includes site owners who want to know what third-party trackers are doing on their sites. And it especially includes journalists and activists.

By shining a light on privacy at scale, and making it easier for others to do so, Englehardt and Narayanan believe they can drive better behavior. And that can only be a good thing.