A senior executive at private browser company Brave has accused Google of using a workaround that lets it identify users to ad networks. The system violates GDPR – the EU’s data protection regulation – he said.
Brave’s chief policy and industry relations officer Dr Johnny Ryan made the accusation against Google’s Authorized Buyers (formerly DoubleClick, the advertising network which incorporates 8.4 million websites) in a blog post last week.
Whenever you visit a member site, Authorized Buyers logs the visit and what page you were looking at. This information, aggregated from the sites that you visit, forms a detailed profile about you. Authorized Buyers also does something else whenever you hit one of its member sites: it puts you up for auction. It takes bids from advertisers interested in showing you ads based on your profile. It happens in microseconds, in a process called real-time bidding (RTB).
Ryan submitted a complaint about Authorized Buyers to the Irish DPA in September 2018, which prompted a formal investigation. He had three main concerns.
First, he said that what had started as a simple personalised advertising mechanism had morphed into a mass data collection system that collected more data than necessary and sent it on to numerous third parties.
Second, once that information was sent on, it was no longer secure or controllable.
Third, he worried that this data might include what GDPR calls ‘special category’ information. That’s data on sensitive subjects like sexual orientation, ethnicity, or political leanings.
A clever workaround?
GDPR calls for strict controls over the use and dissemination of personal data – especially special category data – and Google must comply with it because it deals with European residents, so how could it be doing this? In his blog post, Ryan accuses the search giant of using a clever workaround:
Analysis of the network log shows that the Data Subject’s personal data has been processed in Google’s Authorized Buyers RTB system. It further shows that Google has also facilitated the sharing of personal data about the Data Subject between other companies.
Push Pages therefore appear to be a workaround of Google’s own stated policies for how RTB should operate under the GDPR.
Ryan worked with third-party researcher Zach Edwards at web analytics company Victory Medium to analyse browsing sessions on a new machine that he hadn’t used before.
In an email interview, Edwards told Naked Security that Google has historically tracked its users with an identifier called
google_user_id. Demand-side platforms (DSPs) – companies that manage multiple advertising purchases on behalf of advertisers – could use these identifiers to understand who users were and what they were doing.
The identifiers were what Edwards calls shared strings, and because they lacked consent, they didn’t comply with GDPR, he warned. Google announced a year ago that it was phasing these out for European users by the end of this year.
I’m certain Google wanted to keep the google_user_id field, but it’s not GDPR compliant – they had to trash it. It’s a unique user identifier shared across multiple companies.
Edwards and Ryan discovered a new mechanism that they call push pages. These all come from the same Google web address, but they each append a pseudo-anonymous unique identifier to the address. These identifiers rotate every 14 days. Advertisers can still use them to identify users, according to Edwards, but Google only gives them to the auction winner and any DSPs that it synchronizes with to optimize future auctions. He explained that “slight limiting of the shared strings” and “putting it behind the scenes” is what makes this a GDPR workaround.
However, he argued that push pages still fall foul of GDPR:
Multiple DSPs are given that same string, which is what puts the entire cookie_push.html structure out of GDPR compliance.
DSPs match unique identifiers (cookies) with the information that they have about a website visitor using a mechanism called match tables. The idea is that a DSP should only be able to collaborate with Google on a match table so that only it and Google have data about a user. Google forbids DSPs from collaborating together on their match tables to find out more about website visitors.
However, Edwards said that the unique identifiers found in push tables break that rule:
Basically, Google has TOS that prevent companies from collaborating on match tables, but then Google turns around and gives them a shared string
He accused Google of not auditing or controlling what happens to these push page identifiers after DSPs received them. In at least one case, he claimed a DSP was sharing the identifier with other companies.
Ryan’s isn’t an isolated complaint. Jim Killock, executive director of the Open Rights Group, and Michael Veale, a professor at University College London, submitted duplicate complaints to the UK Information Commissioner’s Office (ICO) in September 2019. That resulted in a report from the ICO, published in June 2019, which it passed to the adtech industry for comment. It said:
Thousands of organisations are processing billions of bid requests in the UK each week with (at best) inconsistent application of adequate technical and organisational measures to secure the data in transit and at rest, and with little or no consideration as to the requirements of data protection law about international transfers of personal data.
It added that adtech companies are processing data for these auctions unlawfully, and that they aren’t being clear enough with people about the privacy implications. It said that it wants changes, and will review things at the end of the year.
Concern over Authorized Buyers’ practices is mounting. Activists have also filed duplicate or similar complaints in Belgium, Luxembourg, the Netherlands, Poland, and Spain.
A Google spokesperson told us:
We have strict policies that prohibit advertisers on our platforms from targeting individuals on the basis of sensitive categories such as race, sexual orientation, health conditions, pregnancy status, etc. If we found ads on any of our platforms that were violating our policies and attempting to use sensitive interest categories to target ads to users, we would take immediate action.
9 comments on “Brave accuses Google of sidestepping GDPR”
In light of this, when will you stop using Google to track your users? This is the ironic email link that brought me to this story: https://nakedsecurity.sophos.com/2019/09/09/brave-accuses-google-of-sidestepping-gdpr/?utm_source=Naked+Security+-+Sophos+List&utm_campaign=78782b9593-Naked+Security+-+Sep+2019+-+ad+B+%28G2%2C4%29&utm_medium=email&utm_term=[item removed]
If you use Gmail, you will notice that links often (even when you send it to yourself) get modified by the Goog so they get a Hit for linking to that page – which is very dirty BS. But the Goog does what it wants. Seems to do this when I’m at my PC all the time, but not on the phone. hmm
CB, I’m not sure where Google is involved. That URL permits Sophos.com and WordPress.com and Automattic.com (WordPress’s host) to track you, but not Google. Why do you think Google is involved?
(That long string after “utm_term=” may represent an individual userID. Mine is the same up to the dash, then varies from yours.
For the avoidance of doubt, URL parameters that start utm_ are Google Campaign parameters for use with Google Analytics. Google Analytics is a web analytics package that allows us to see how much traffic Naked Security gets, which articles are popular and so on. Visitors to Naked Security receive Google Analytics cookies, which allow us to see things like how long visitors spend on the site and whether or not they’ve visited before. Visitors are not personally identified (and, although we wouldn’t do it even if it were allowed, it is against Google Analytics’ terms of service to associate session data with any personally identifiable information).
The parameters in in the email URLs allow us to determine what proportion of Naked Security’s traffic comes from its daily email newsletter each day.
It’s very difficult to operate a website without web analytics in any context and effectively impossible in a commercial environment, where being able to demonstrate the size of the site’s audience is directly linked to its continued existence.
We think that Google Analytics is the least worst compromise. The way it works is very well understood and it’s easy to block if you want to. Session tracking is based on cookies which are part of the HTTP standard, and the most widely understood and easy to block identifiers (as opposed to ETags, canvas fingerprints and other exotic techniques designed to make it hard for you to block them).
Mark Stockley wrote: “The parameters in in the email URLs allow us to determine what proportion of Naked Security’s traffic comes from its daily email newsletter each day.”
Mark, if that’s the case, why do the utm tokens vary between users? The token for the same article on the same day doesn’t need to vary. I wasn’t suspicious before, but now I am wondering.
Hi Laurence, I could have be clearer, sorry. I was explaining how we use the information, not all the possible uses of it.
You are correct that the utm_term parameter carries a unique user ID.
The integration between Mailchimp (our email provider) and Google Analytics is a “one tick” process that triggers Mailchimp to add Google Campaign parameters to links. The process is one-way – data is sent from Mailchimp to Google Analytics in the URLs as described but no data is sent from Google Analytics to Mailchimp.
We don’t currently use the ID added by Mailchimp to URLs, but only because we use a different identifier that does the same job (a cookie assigned by Google Analytics when you visit our site). Being able to differentiate unique visitors is important in any web analytics system because 100 clicks by one person is not the same as 1 click by 100 people.
Mark Stockley wrote: ” Being able to differentiate unique visitors is important in any web analytics system because 100 clicks by one person is not the same as 1 click by 100 people.”
Mark, I don’t get it. Sophos offers no prizes, nor do they win prizes for getting the most clicks. Why would anyone bother to pile clicks on Naked Security articles? At most you would double-count a couple of duplicates from people like me who hesitated before coming back a second time and making this response.
And why couldn’t that be done with IP addresses, anyway?
I’m not tin-hat paranoid, but this seems like overkill. I manage four websites and don’t go to this sort of effort.
Laurence, I don’t see any difference at all between the strings posted by you and “CB”. Are my tired old eyes really getting that bad?
Laurence is correct, see my answer above.