Data-matching: what happens when firms join the dots about you?

You may not have heard of data matching, but I guarantee you’ve heard of the companies that do it. Data matching is where a company simply takes internally held data, matches it with publicly available data, analyses it and then uses it for the purpose of raising money, targeting people, or whatever the business goals are.

Recently, Uber was caught out for using a program it called Greyball to dodge law-enforcement officials in cities where the service was being rolled out. Essentially Greyball carried out a data-matching process to figure out whether users were government officials or not.

The New York Times explains in more detail about how it did this. Uber employees cross-matched usernames with social media profiles, employed “geofencing” around government offices to identify potential officers, and assessed whether credit card information was tied to an institution. And apparently it worked quite effectively, since they managed to evade law enforcement in several US cities.

Even charities have been caught red-handed using data matching to further their fundraising goals. Late last year the UK’s data protection authority, the Information Commissioner’s Office (ICO), fined the RSPCA and the British Heart Foundation for breaching the Data Protection Act. The ICO found they did this by targeting new donors by data matching, by trading personal data with other charities, and by screening donors using their data without their consent.

Most importantly, perhaps, is the question of using data matching to further political aims. I have written about the limitations of Cambridge Analytica – which has been credited with winning Donald Trump the White House and helping win the UK’s referendum on leaving the European Union – and its approach to Facebook data, but never touched on the potentials for data matching.

Outside of using Facebook’s tools, there are of course many ways in which the data can be extracted and matched to data that’s held elsewhere. And it doesn’t require a stretch of imagination to think that Cambridge would have access to donor lists or voter lists, whose data could be extrapolated to create an extended pool of potential supporters.

How ethical and legal is this?

It’s extremely tricky to legislate for this sort of behaviour and even harder to enforce it. After all, if people leave breadcrumbs of their identity around the internet and these pieces of data can be matched together, what’s the harm? Most of the data is freely available and not obtained by nefarious means.

The problem is volume – there is so much personal data spread about online that matching them can give companies insights into your life that you never really expected they would have. And what’s more, generally these users haven’t explicitly consented to using data-matching methods to mine the internet for more information.

Data matching is limited in some ways because most of us don’t have unique names and there tend to be at least a handful of other people around. But what happens if someone is trying to match my details to public data and instead they get another Sophie Warnes who might have a different social media presence to me or have a different job?

This happened to a friend of mine in a PR stunt gone wrong – she was sent a “dossier” on herself, which detailed where she lived, and assumed the person living with her was her partner. Only… it was another woman in the same city with the same name. It’s a bit… Orwellian.

Minimising the risks

Many of us assume that the data we give away to companies is protected, precisely because the Data Protection Act specifically stipulates that companies can only use the data in certain ways. In the US, there are also federal laws around privacy that cover this topic in some ways – indeed, Spotify, Spokeo, and several other US websites have been sued over the way they tracked users on the web.

In the case of Facebook, lessening the risk of being targeted or used as part of a mass data-gathering campaign is relatively easy. Make sure your account is locked down and everything made private as much as possible. Don’t “Like” any fan pages unless absolutely necessary. Don’t do quizzes, no matter how fun they are. Install the Chrome extension DataSelfie to audit your Facebook usage and see exactly how much information you’re giving away.

When it comes to installing new services or opening new accounts, make sure you know how your data will be used. Specifically, avoid checking the box that mentions sharing data with third parties. You don’t want your data being sold on by these companies so that you can be harassed by something you didn’t explicitly sign up for.

The biggest problem is the wealth of “publicly available data” that companies can get their hands on, and this is much harder to counter. Many sites aggregate this data, and some use it together with social media accounts, email accounts, and anything else they can find.

The thing is, while this data was always publicly available – records are created from the electoral roll and other public records – it has always been tucked away in physical locations like town halls or libraries. Technically publicly available, yes – but not available at the click of a button. Until recently.

What’s more, these aggregators make it pretty difficult for you to opt out, and while they must be transparent in order to comply with the law, they are hardly extensively advertising the fact that you can remove your personal data.

In the US, removing yourself from such sites like BeenVerified, Spokeo, etc, is a lot harder. In fact, you often need to sign up with them and give them more of your personal data (copies of ID cards, etc) in order to be removed.

And then there’s the problem of them re-accessing your data and putting you on again. As “Getting your data off once is not enough because the sites buy data and aggregate more info continually, making it likely that if you don’t take precautions, you’ll be put back in,“ a ZDNet article about removing yourself from US search websites says.

While you can get your data removed from these sites, it might take a while and the chances are, until you tackle it a source, the data will appear there again as they access it again. This is why you need to tackle it at source.

There are two versions of the UK’s electoral register – the main one which is only used to prevent fraud, for elections, and for checking credit or financial applications. The other is the open, or edited register, which is available for sale and can be used for marketing purposes. In this case, you need to contact the local council (wherever you registered to vote) and tell them that you want to be taken off the open/edited register.

TL;DR: how to protect yourself online

  • Always check the terms and conditions of anything you sign, and opt out of giving third-party companies your data
  • Request data harvesting companies remove your details – you usually need to fill out a form
  • Ask to be taken off the edited/open electoral register (if in the UK)
  • Log out of Facebook when browsing the web elsewhere to prevent companies tracking you
  • Use Incognito mode on Chrome
  • Use a VPN (Virtual Private Network) service when accessing the internet outside trusted networks
  • Don’t do quizzes on Facebook and external sites, you have no idea what they’re using the data for or what it could be used for in future
  • Keep social media accounts as separate and un-linked as far as possible
  • Use alternative names online to what it says on your electoral roll

What next?

In the UK, the ICO is now investigating Cambridge Analytica, citing “concerns about Cambridge Analytica’s reported use of personal data”.

The company says that it doesn’t have access to Facebook data, and that the information discussed “relates to a research project”. The ICO hopes to publish its findings on the case later this year, but it will be an interesting one to watch as this raises serious questions about these data-matching tactics, which are used by marketing professionals worldwide.