Cookies are an essential part of the way the web works and occupy a pivotal position in the online privacy arms race. Organisations who want to track and profile people give them cookies and users who don’t want to be tracked disable or delete them.
But what if there was a cookie you couldn’t delete, and what if the steps you took to guard your privacy made you easier to track?
That is the spectre raised by a report, authored by the Electronic Frontier Foundation (EFF), entitled How Unique is Your Web Browser?
The report uses data gathered by a tool called Panopticlick that determines how easy you are to identify based on your web browser’s ‘fingerprint’.
Uniqueness is important because organisations can only track people when they can tell one user from another.
The most common form of tracking involves giving users cookies with unique IDs. Each time a user visits a page that has code from the cookie’s original domain their web browser returns the cookie, and its unique ID, with the request.
This simple mechanism allows organisations to track users for weeks or months, from one page of a website to the next or even across multiple websites if those sites share code.
However, for people wanting to track you online, cookies have three major weaknesses: you can see them, you can see who gave them to you and you can delete them. If you delete a website’s cookies then the unique ID it gave you has gone and you’ll appear to it as a new and unknown user without a tracking history.
The holy grail for people who want to track you against your wishes is to find a unique ID you don’t know you have or that you can’t interfere with.
The EFF set out to discover if browser fingerprinting could provide just such an ID.
Browser fingerprinting looks at the combination of information your browser voluntarily hands over about itself when it opens a web page.
Although the web has hundreds of millions of users, most of them are using the latest versions of about five different browsers. With so little variation you might assume your browser is easily lost in the herd.
You couldn’t be more wrong.
According to the EFF’s research, your browser fingerprint is likely to be very distinct indeed:
In this sample of privacy-conscious users, 83.6% of the browsers seen had an instantaneously unique fingerprint...
...if we pick a browser at random, at best we expect that only one in 286,777 other browsers will share its fingerprint. Among browsers that support Flash or Java, the situation is worse ... 94.2% of browsers with Flash or Java were unique in our sample.
I tested two of my own devices using the EFF’s Panopticlick tool. My tablet shared a fingerprint with one in every 875,000 visitors and my laptop’s browser was completely unique amongst 4.4 million browsers.
The sample size for the research paper was 470,161 users. To make matters worse the EFF acknowledges that its sample is likely to be biased in favour of people who are already privacy conscious.
While our sample of browsers is quite biased, it is likely to be representative of the population of internet users who pay enough attention to privacy to be aware of the minimal steps ... generally agreed to be necessary to avoid having most of one's browsing activities tracked...
Panopticlick creates its fingerprint from just eight pieces of information that are freely shared by web browsers, such as your timezone, screen resolution, plugin choices and fonts.
The research shows that browser fingerprints are probably unique enough to be used to ‘regenerate’ deleted cookies or even to replace tracking cookies entirely.
Although the EFF didn’t investigate if anyone is actually using browser fingerprinting in practice, they do note in the report that “…there are several companies that sell products which purport to fingerprint”.
Similar techniques, such as using ETags to regenerate cookies, certainly have been used in the wild.
Fingerprinting also raises an interesting dilemma for users who are particularly privacy conscious – browser customisations designed to make you harder to track might actually make you easier to fingerprint.
The paradox, essentially, is that many kinds of measures to make a device harder to fingerprint are themselves distinctive unless a lot of other people also take them.
So what’s a privacy conscious user to do?
Well, the first thing to remember is that whatever advanced techniques may or may not be in use, cookie tracking is still the one you’re most likely to encounter so don’t ditch cookie-munching plugins like Ghostery just yet.
If you’re worried about fingerprinting then you can reduce your distinctness by removing Java and Flash, and then taking greater control over the Javascript your browser runs.
The report praises the NoScript plugin, a browser add-on that allows you to choose which javascript you want to run, as “… a useful privacy enhancing technology that seems to reduce fingerprintability.”
It also identifies the Tor project as “noteworthy for already considering and designing against fingerprintability.”
Perhaps most usefully though, you can test and re-test the fingerprint of whatever strategy you use to stay anonymous online.
Not sure this is true.
Quite possibly, a composite fingerprint (versions of common plugins, base os, browser version etc) is very accurate indeed.
The issue arises though – for how long? Because I don’t know about the majority of people here, but I patch my browser regularly, its plugins as soon as a new version is released, and my OS at irregular intervals. That would mean that continuity for my fingerprint may continue for some hours, possibly even some days, but will eventually change; at the same time, some other user may update and collide with *my* fingerprint, providing overlap for a period of time before I (and they) update again and the fingerprint is left abandoned.. for now.
Section 5 of the report actually addresses this specifically.
As part of the test the EFF constructed a crude algorithm designed to identify new fingerprints that are actually an evolution of another fingerprint it’s already seen.
“…our heuristic made a correct guess in 65% of cases, an incorrect guess in 0.56% of cases, and no guess in 35% of cases. 99.1% of guesses were correct, while the false positive rate was 0.86%.”
I suspect that if fingerprinting is being used it will be as a way to regenerate cookies (we think we’ve seen this user before but their cookie seems to have gone, give it back to them). If it’s used in that role then it doesn’t have to 100% accurate to be useful, it just has to have a low rate of false positives.
“doesn’t have to 100% accurate ”
Doesn’t that (and all sorts of other considerations) make it fall foul of UK/EU data protection principles?
DPP1 Personal data shall be processed fairly …
DPP2 Personal data shall be obtained only for one or more specified and lawful purposes …
DPP3 Personal data shall be adequate, relevant and not excessive …
DPP4 Personal data shall be accurate …
DPP5 Personal data processed for any purpose or purposes shall not be kept for longer than is necessary …
DPP6 Personal data shall be processed in accordance with the rights of data subjects …
DPP7 Appropriate technical and organisational measures shall be taken against unauthorised or unlawful processing of personal data …
DPP8 Personal data shall not be transferred to a country or territory outside the European Economic Area unless
Sounds as if we need a test case.
I think it is actually the protection against UK/EU DPP; because it isn’t 100% accurate and doesn’t record information that’s actually personal (just a variety of generically unique information), it could be argued that no personally identifying information is collected in this situation — even if the information could be personally identifying in the majority of the cases where it is used.
However, as soon as someone starts linking credit card details, addresses, names, etc. to this information instead of using it as a tracking beacon only, the privacy laws should be in full force.
I figure my browsers are among the 35% group — I’ve found that due to the way most of them are configured, I get a different fingerprint every time I use PanOptiClick. On some of my browsers, I am uniquely identified every time, but that unique identification is always different, which makes tracking rather difficult.
So one way to “hide in the herd” here is to appear to be a different animal every time; that makes it rather difficult to track, moreso than looking just like all the other marketing targets.
You wish to tell the group what it is about your browsers that make them appear with a unique fingerprint each time you visit panopticlick?
I certainly like the idea of an ever-changing browser impression versus hoping that my browser is exactly like most other browsers out there.
So essentially the next step in browser privacy is data reduction. Firefox should reduce timezones to the minimum set (eg my iPad says my “timezone” is “Rome, Italy” which is unnecessarily specific – it’s the same time in multiple European countries, so Firefox could reduce all such timezone values to a single item) and screw down a standard set of fonts, and pretend there’s nothing else installed on the system (a set of free OpenType fonts, naturally). Reducing or eliminating these two variables would drastically reduce the uniqueness of this fingerprint, I would have thought.
I wonder if there isn’t a place for a plugin or browser that allows you to control specific parts of the javascript api rather than just toggling javascript on or off.
4 out out the 8 items used in this fingerprint rely on javascript (things like as screen resolution and polling for what plugins are installed).
You can simplify your fingerprint by turning off javascript but 99% of javascript isn’t the problem and won’t help somebody fingerprinting you.
Perhaps what’s needed is something like the control that exists for accessing geographic location via javascript. Users have to opt-in each time their location is requested, regardless of the browser they use.
If your permission was required for sites to use a few small parts of the javascript api you could use most sites without any problem and exercise a great deal of control over your fingerprint.
And it’s not just fingerprinting where it could apply.
We’ve seen from other stories that users are often horrified when they discover that javascript (and therefore more or less any web page) can be used to track your keystrokes and where your mouse is when you’re on that site. They’d make good candidates for off-by-default parts of the api.
I give this one a thousand thumbs up!
Sounds like the Browser developers need to step up to the mark and by default prevent reporting of so much data about browsers to websites.
And website developers should stop trying to be so damned clever that they then claim they need to know about whether a specific font or specific plugin is installed or not.
You’re missing the point. Developers don’t care which font or plugin you are using – except maybe to choose the best features and style sheet for your browser. They use the data to say, ‘I know who this is’.
In my residential development of 182 homes, there are only 5 floor plans, 5 outside colors and one cladding type. I can still tell each home apart because no two have exactly the same plants, lawn furniture, trim color or car type in the drive. Even if one updates his car or removes a tree, I can factor that in and still ‘know who that is’.
I think you *may* have missed my point.
The reason that browsers are designed to report so much “browser environment” is so that (non malicious) website developers can tell lots about your browser (tell “what it is”) so that they can send clever web-pages to you. Going beyond the “best features and style sheet”.
The cost (to us) of making it easy for them to be “too clever by half” is the potential privacy loss – and giving websites the ability to determine as you say “who that is’. This privacy hole was surely not intended by the mainstream browser manufacturers (at least most of them!), it is surely an unintended consequence of providing so much browser environment information.
But is all this “cleverness” worth it? Websites should “work” irrespective of specific plugins or fonts etc. If we (users) were prepared to accept slight imperfections arising from “wrong fonts” etc., there should be no need for web servers to request this information.
So if browsers were incapable of delivering much of this information to web servers we would not have this privacy whole.
Can we even thin down the information content in the USER_AGENT and HTTP_ACCEPT Headers characteristics?
I first learned about Ghostery and NoScript thanks to NakedSecurity. Glad to know that following the advice I found here is still a smart move.
By way of reader feedback, the sensationalist articles by some non-Sophos contributors are of little use, but articles like this one are very helpful and informative.
Thanks!
I tested Firefox w/NoScript enabled and did not allow EFF and the difference in “uniqueness” dropped from one in approx 4.38 million to one in 84k. Both Safari and Chrome tested at one in 4.3806 M +/- . I have settings to reject 3rd party cookies and have only java script enabled on all browsers. True, Flash exists on the system. So, like the author, I guess I am unique.