Entire Oakland Police license plate reader data set handed to journalist

License plates. Image courtesy of Shutterstock.

License plates. Image courtesy of Shutterstock.

Howard Matis, a physicist who works at the Lawrence Berkeley National Laboratory in California, didn’t know that his local police department had license plate readers (LPRs).

But even if they did have LPRs (they do: they have 33 automated units), he wasn’t particularly worried about police capturing his movements.

Until, that is, he gave permission for Ars Technica’s Cyrus Farivar to get data about his own car and its movements around town.

The data is, after all, accessible via public records law.

Ars obtained the entire LPR dataset of the Oakland Police Department (OPD), including more than 4.6 million reads of over 1.1 million unique plates captured in just over 3 years.

Then, to make sense out of data originally provided in 18 Excel spreadsheets, each containing hundreds of thousands of lines, Ars hired a data visualization specialist who created a simple tool that allowed the publication to search any given plate and plot its locations on a map.

That’s when things got a bit more worrisome for Mr. Matis.

After Ars ran his plate, the journalists were able to show the physicist a map of the five instances where a camera had captured his car, guessing (correctly) that they were near where he lived or worked: places where, Matis confirmed, he and his wife go “all the time”.

He hadn’t been worried about the police having his movement data, but the thought of it being stored, for an indefinite period of time, even though he wasn’t being investigated, and then having it handed over to anybody who simply asked, well, that’s where the creep factor came in, he told Ars:

If anyone can get this information, that’s getting into Big Brother. If I was trying to look at what my spouse is doing, [I could]. To me, that is something that is kind of scary. Why do they allow people to release this without a law enforcement reason? Searching it or accessing the information should require a warrant.

This is the letter that he immediately sent to his city council member:

Do you know why Oakland is spying on me and my wife? We haven't done anything too radical or illegal.

I gave my license plate to a journalist and he found my wife's and my car in their database. One of the locations is right near our house.

The astounding thing about this information is that anyone, and I mean anyone, can get this information. Some of the information is more than two years old.

I can see lawyers using this information for lawsuits. I can check where my wife is located. Car companies can see my habits. Insurance companies can check up on their clients. We have entered the world of 1984 with the difference that anyone can get the information.

Matis’s concern is justified.

Many people, when asked how they feel about surveillance, shrug it off, claiming that they don’t have anything to hide.

They’re wrong. We all have something to hide – not because we’re guilty of crimes, but because we deserve data privacy.

Such privacy is crucial for a number of reasons. For one thing, it shields us from persecution, whether it concerns our race, religion, gender, political orientation, or any other of a vast number of personal attributes.

Can our geolocation reveal such things about us?

Absolutely. Catherine Crump, a law professor at the University of California, Berkeley, made this point when talking to Ars:

Where someone goes can reveal a great deal about how he chooses to live his life. Do they park regularly outside the Lighthouse Mosque during times of worship? They’re probably Muslim. Can a car be found outside Beer Revolution a great number of times? May be a craft beer enthusiast - although possibly with a drinking problem.

As Naked Security often stresses in our reporting about Big Data, we have to stop thinking about data sets in terms of individual records and start thinking about them in terms of huge networks of possible relationships that exist between those records.

As Paul Ducklin recently pointed out, license plate readers are a good example of how seemingly innocuous pieces of discrete data – i.e., where your license plate was and when – manifest into something entirely different when amassed in huge data sets and cross-correlated, given that your plate number stays constant while your location changes.

There are properties and capabilities that emerge from large collections of data that don’t exist in the same data at smaller scales (it’s why we had to invent a term – Big Data – to describe it).

While one data point about a license plate could – and has – been used to do things such as track fugitives or solve a gang-related homicide, there’s no saying what the government can do with massive amounts of correlated data spanning years of collection, the vast majority of which has been surveilled from innocent people who aren’t breaking any laws.

As a group of MIT graduate students outlined in this paper, even supposedly vague/imprecise/anonymised data can tell you who’s who once your data set gets big enough.

In fact, anonymity fell off the data like tissue paper in a rainstorm when the data sets got big enough, as Paul writes:

When the authors knew the details of any four transactions you'd made during the three-month data period, as, for example, would any shop that you had visited four times, they had a chance lower than 15% of guessing which anonymous tag in the file was yours.

But with 10 known transactions, something you might easily rack up with multiple retailers due to daily habits at at a coffee shop, a parking lot, or a newsagent, their chance of pinpointing you rose above 80%.

Oakland, it’s time we had a talk.

For a city in laid-back California, you’re pretty jittery. It looks a bit like your data collection habit is getting out of control.

As it is, you’ve been one of the biggest surveillance hotspots for years, in a country where cities are increasingly gobbling up data on residents and ignoring privacy.

You’re gathering it. You’re retaining it. You’re passing it out to journalists.

To echo Mr. Matis: why are you spying on him and his wife?

Why are you spying on all your other citizens, come to think of it?

It can’t be for solving crimes, since, as Ars reported, your “hit rate” of reading license plates of people who are actually under suspicion is at 0.16 percent.

It’s time to rethink your ways. You, and much of law enforcement.

Image of license plates courtesy of Shutterstock.