HealthCare.gov, the US federal health insurance exchange website, is inadvertently sending users’ personal health information to fourteen separate third party websites.
The site, a central component of The Affordable Care Act (often referred to as Obamacare) leaks data via referer headers.
According to reports from the Associated Press and the Electronic Frontier Foundation (EFF), the data being sent to third-party websites includes zip code, age, income, and whether or not you’re pregnant or a smoker.
The health data is being leaked in referer headers because, rather unwisely, HealthCare.gov includes that health data in its own URLs.
It works like this:
When browsers request a web page, the request includes the URL of the page that the request was referred from in its referer [sic] header (the misspelling is enshrined in the offical HTTP specification).
If you went from this page to our Cookies and Scripts page, the request would look something like this:
No problem so far, that’s how HTTP is supposed to work and the referer header is even occasionally useful.
However, the page you think you’re going to isn’t the only page that gets referral data.
If the web page you’re on contains third party code like a Twitter widget then your browser has to get that code from the third party website, and that request has a referer header too.
So, via the referer header, your browser is telling Twitter what web page you’re on when it asks for an embedded Twitter widget (and the same is true of any other third party code).
Whether you know it or not, and whether they’re listening or not, you’re sharing which bit of the web you’re on at that moment with all the third party code used to build up a web page.
On the Obamacare site you share a whole lot more than just where you are though.
HealthCare.gov includes sensitive information about the person using the site in its URLs as a way of passing information from one page to another.
As mentioned before, the URLs contain information such as your age, zip code and income and whether or not you’re a smoker or pregnant (the URL is rather long but you can scroll the box left and right to see it in all its glory).
If you logged in to HealthCare.gov and visited the URL above you’d dispatch that whole URL, including all the personal data within it, to any sites providing third party code for that page.
In the case of HealthCare.gov that’s fourteen different websites, some which are websites belonging to advertising companies who specialise in user profiling.
The URL doesn’t contain anything that names or identifies a specific individual but that doesn’t make it safe – it’s alarmingly easy to identify individuals from scant, anonymized data.
The government explicitly prohibits the companies from using the data in the referer and there is no suggestion that any of them are actually using the leaked data for their profiling (and, given the stakes, I suspect it’s unlikely they’d even consider it) but that doesn’t make it OK.
Most websites store logs of which pages have been visited and those logs often include the referring URL. Some companies backup that data for years and since nobody will be expecting it to contain health information it’s unlikely to be treated with the level of sensitivity required.
Which means that even if all 14 sites are operating with impeccable ethics, there could still be 14 separate, accidental copies of that leaked data hanging around for a long time with less than ideal protection applied.
As the EFF’s Cooper Quentin noted in a blog post, private health data should not be shared in this way, and it’s a “massive violation of privacy”:
People's private medical data should not be available to third party companies without consent from the user. This practice is negligent at best.
The sharing is accidental but it comes as a result of poor choices in the design of HealthCare.gov rather than HTTP itself.
Our privacy shouldn’t depend on the ethics of companies with a conflict of interest and we shouldn’t be in the business of trying to predict how somebody might be able to access our leaked data in future (or what they might cross reference that data with).
The principle of least privilege demands that the data shouldn’t have been there in the in the first place.
URLs are not, and shouldn’t contain, sensitive information. They get found, indexed, spread around and stored through all sorts of different mechanisms including server logs, bookmarks, browser histories, search engine indexes and server status pages.
To avoid leakage, sensitive data should only be sent in the request body of an HTTP POST request and received in the message body of the response (via HTTPS of course).
End users can control referer headers and protect themselves from poorly designed, leaky sites with a range of plugins and configuration controls (too many to list here I’m afraid).
That’s only half the story though – third party code can greatly enhance the functionality of a website but it enjoys very privileged access to any pages its included on and gobbling up referer headers is just the tip of the iceberg.
Rather than focusing on what to do about referers specifically, it’s proabably better to use plugins like NoScript, Ghostery or the EFF’s own Privacy Badger to control which third party sites you want to share anything with.
10 comments on “How the Obamacare website healthcare.gov leaks private data”
The referer header isn’t misspelled- it contains the URL for the website that referred the client to the next site. the “referral” as you call it would have to be the URL for the new target.
I didn’t mean that referer was a misspelling of referral, I meant it was a misspelling of referrer. However, the way I wrote it was less than great and your interpretation is probably the only sensible way to read it. Fixing it now…
Perhaps they could sell the information to FB and help drive down the costs and get rid of the ridiculous yearly deductibles that would bankrupt the average family they claim to be helping.
Nunos, since you told the truth about The-Not-So-Affordable Health Care Act, I have to presume that the peeps who thumbed you down either:
A) Don’t understand the concept of satire and being facetious, or
B) Still don’t get that Obamacare passed on the basis of Lie #1: like “If you like your doctor you can keep your doctor” or
C) Lie #2: Obamacare will save all the taxpayers money on health care, or
D) Don’t realize the subsidies are not free, ergo our taxes will have to be raised to pay for the subsidies we’re getting, or
E) Are still in denial about the mistake they made voting this man into office.
Either way, in less than a decade when all of this fiscal lunacy causes world backers to raise the US interest rate and we enter the tailspin of hyperinflation, most people who voted for Obama will lie and claim they didn’t.
Making this even worse, many of these http requests will contain a unique identifier in a cookie, linking to the referer header data to whatever else the user has done previously on the same third-party website, and to a username in the case of sites like Twitter and Facebook.
Simple answer to all these referrer problems (at least in Firefox browser): In a new tab, type “about:config” in the address bar, click OK on “I promise to be careful”, and move down to the entry:
and change it from 2 to 0.
All your referrer problems go away.
Note: You can keep it this way forever but when you need to visit your bank or brokerage you may need to change the value to1. (And then change it back to 0 when finished.)
Install the Lightbeam plug-in, and compare before and after!
For the what and why of Lightbeam, see:
Oh, and a quick explanation of those numbers:
network.http.sendRefererHeader = 2 –> always send the referrer data
network.http.sendRefererHeader = 1 –> send the referrer data for links clicked but not for images loaded
network.http.sendRefererHeader = 0 –> never send any referrer data
(I can’t find an option that sends the referrer but deliberately limits what it contains. I’d like to be able to say, “send the referrer for links only, and always chop it short before any URL parameters”.)
The “poor choices in the design” were cleverly enshrined to do just what it’s doing – invading our privacy. This is why the same designers are being hired by the IRS – to deliberately make the same “poor choices.”
Hmmm. As far as “deliberate poor choices” go, this is a peculiarly poor one 🙂 I think you may be finding what you set out to look for here.