Because of Edward Snowden, we’ve been hearing a lot about metadata for the past 15 months.
Governments have been getting that metadata by covert means of questionable legality.
As news about surveillance efforts have leaked, officials have typically downplayed it as “just” metadata – as if metadata didn’t reveal just about as much about us as email or phone call content itself.
As author and researcher Door Hans de Zwart of Dutch digital rights foundation Bits of Freedom noted, even recently, on its website, the Dutch Intelligence Agency (AIVD) downplayed the interception of metadata as “a minor infringement of privacy”.
Thanks to a Dutch man, Ton Siedsma, we now have a glimpse of the type of information that can be squeezed from what officials would have us believe is “just” metadata.
→ For a quick and accessible explanation of metadata and what it’s all about, see our recent Patch Tuesday analysis, where we explain why simply knowing if a file is there, without being able to look inside it, can tell you almost as much as knowing what it contains.
Siedsma voluntarily handed a week’s worth of mobile phone data over to researchers, one of whom was Hands de Zwart.
Siedsma didn’t just give up the geolocation details of his wanderings, mind you.
Siedsma allowed researchers access to the same type of metadata that intelligence agencies would collect, including phone and email header information, by letting the researchers install a data-collecting app on his phone.
The app pulled off a blizzard of data:
From one week of logs, we were able to attach a timestamp to 15,000 records. Each time Ton's phone made a connection with a communications tower and each time he sent an email or visited a website, we could see when this occurred and where he was at that moment, down to a few metres. We were able to infer a social network based on his phone and email traffic. Using his browser data, we were able to see the sites he visited and the searches he made. And we could see the subject, sender and recipient of every one of his emails.
These are some of the basic things that the researchers discerned from just one week’s worth of metadata from Siedsma’s life:
Ton is a recent graduate in his early twenties. He receives emails about student housing and part-time jobs, which can be concluded from the subject lines and the senders. He works long hours, in part because of his lengthy train commute. He often doesn’t get home until eight o’clock in the evening. Once home, he continues to work until late.
The researchers aren’t sure whether he lives with his girlfriend, Merel, but they do know that the couple exchange an average of a hundred WhatsApp messages a day, mostly when Ton’s away from home.
They know he’s interested in sports. That he’s into cycling. His sister’s name.
He reads Scandinavian thrillers. Or, well, at least that he searches for titles on Google and Yahoo.
He’s probably Christian. He enjoys reading about “cats wearing tights”, “Disney princesses with beards” and “guitars replaced by dogs”.
The marketing angle
The researchers also found that Ton would be like candy to online marketers:
If we were to view Ton’s profile through a commercial lens, we would bombard him with online offers. He’s signed up for a large number of newsletters from companies like Groupon, WE Fashion and various computer stores. He apparently does a lot of shopping online and doesn’t see the need to unsubscribe from the newsletters. That could be an indication that he’s open to considering online offers.
His political leanings
We ... suspect that he sympathises with the Dutch 'Green Left' political party. Through his work ... he’s in regular contact with political parties. Green Left is the only party from which he receives emails through his Hotmail account. He has had this account longer than his work account.
The researchers discerned that Ton is knowledgeable about, and very interested in, technology, information security, privacy issues and internet freedom:
He frequently sends messages using encryption software PGP. He performs searches for database software (SQLite). He is a regular on tech forums and seeks out information about data registration and processing. He also keeps up with news about hacking and rounded-up child pornography rings.
His metadata also makes it crystal clear where he works and in what capacity:
Based on the data, it is quite clear that Ton works as a lawyer for the digital rights organisation Bits of Freedom. He deals mainly with international trade agreements, and maintains contact with the Ministry of Foreign Affairs and a few Members of Parliament about this issue. He follows the decision-making of the European Union closely. He is also interested in the methods of investigation employed by police and intelligence agencies. This also explains his interest in news reports about hacking and rounded-up child pornography rings.
Beyond that, one of the researchers, security expert Mike Moolenaar, concluded that Ton has “a good information position within Bits of Freedom” – a detail that’s important from an intelligence perspective.
Some of the metadata that could have brought the information sifters to that conclusion include Ton’s frequent correspondence with anti-virus software providers and his emails to set up an appointment with a member of parliament’s assistant.
The password pièce de résistance
Information about us is one thing. But what about actually breaking into our accounts?
Can metadata lead governments, or cybercrooks, or any other type of snoop to guess our passwords?
Absolutely. Here’s how the researchers did it with Ton’s data:
First, they compared the data with a file of leaked passwords from the horrific Adobe breach of 150 million user names and passwords.
As you may recall, while the passwords were supposedly “encrypted” (although we don’t know in what way), the password hints were not.
The analysts saw that some users had the same password as Ton. They took a look at their password hints: “punk metal”, “astrolux” and “another day in paradise”, and that lead them to his password:
This quickly led us to Ton Siedsma’s favourite band, Strung Out, and the password 'strungout'.
Using that password, they got into Ton’s Twitter, Google and Amazon accounts.
Besides taking screenshots of normally confidential direct messages on Twitter, the analysts could have actually purchased things on Ton’s Amazon account, but they opted not to.
Ton, we hope you’ve since changed your ways with regards to picking more secure passwords, and that you’re using unique, strong passwords for all your sites, instead of reusing passwords.
But more to the point, the researchers called this very complete portrait mere “child’s play” when compared with what intelligence agencies can do:
We focused primarily on metadata, which we analysed using common software. We refrained from undertaking additional investigation, with the exception of using the leaked dataset from Adobe.
Besides the success they had with just a limited tool set, the researchers underscored the fact that they only had access to one week’s worth of metadata – a fraction of what intelligence agencies have:
An intelligence agency has metadata on many more people over a much longer period of time, with much more advanced analysis tools at its disposal. Internet providers and telecommunications companies are required by law in the Netherlands to store metadata for at least six months. Police and intelligence agencies have no difficulty asking for and receiving this kind of data.
As goes the Netherlands, so goes the US and other countries implicated in NSA-gate.
So the next time you hear a politican append the word “only”, “mere” or “just” to the term “metadata”, think of Ton Siedsma.
Think about how much you now know about him. Bear in mind that this intimate, detailed portrait comes courtesy of your mobile phone and the immense wealth of metadata it has the power to silently hand over.
Image of eye courtesy of Shutterstock.
12 comments on “Just how much information can be squeezed from one week of your metadata?”
So you told us the who what and how. But now how to stop it all
There’s a reason for that. In order to stop it all, you have to get your politicians to recognize AND admit that metadata is private and valuable. Then they have to pass legislation to prevent the government from collecting it en masse.
This isn’t a problem that can be fixed by the end user, apart from refusing to use services such as cellular phones, Internet communication and electronic payment. Even then, you’ll still get caught up in the metadata net of the people you interact with.
This issue is more about raising global awareness than it is about stopping the collection of metadata. Once you realize how much governments and private industry ares actually collecting about you, and that they have the tools to do both bulk and targeted analysis of that information, it will affect the way you interact.
Being forewarned means that everyone will be able to make more informed decisions about how and when they disclose further private information, and how valueable such information may actually be in filling in the holes in the metadata net.
There are various things you can do that make a bit of a difference, but there is a cost in convenience. For example, you could turn your mobile phone off when you aren’t expecting calls. You’ll still get voicemails, but you won’t leave a permanent record of where you were (or where your phone was) at every moment of the day. There’s a cost – it’s less convenient; it could cost you valuable time in an emergency; and it might simply not be practical given your business.
“simply knowing if a file is there, without being able to look inside it, can tell you almost as much as knowing what it contains”
and simply knowing what kind of thing the authorities can do means that you can avoid them finding out about your bomb plot, people-trafficking business, contraband tobacco deal. Thanks a bunch; I feel much safer…not.
Anyone stupid enough to use passwords easily guessed like that almost deserves to get hacked. Call me an angry cynic but that’s just the way I see it. You need to have something obscure or that has no relevance or phoenetic meaning. It also needs to include uppercase, lowercase, and numbers. Symbols if they’re included in the options. Also make sure it’s over 8-9 characters long otherwise I can crack the MD5 hash if it ever gets leaked (assuming it isn’t salted).
Don’t be an idiot, use a strong password and don’t use the same one everywhere.
What was a strong password yesterday may not be any good today. Already there are cracking systems that can easily deal with passwords made up of two random words that have six digits on the end of them and i’s changed to 1’s etc. “Apr!l221973” is no longer a good password.
Also “you deserve what you get” is an unhelpful elitist attitude from the computer users of last century. Computers are now for everyone and most people have no idea about security, nor want to care. It is up to the computer industry to ensure people do the right either via education or ensuring their software ensures good security practices. Unfortunately much of the computer industry only pays lip service to security, all the big hacks taking place point to a huge problem that needs to be addressed, in my view, urgently.
actually use 16+ char..
Guessing you are talking about password length?
A lot of sites sadly restrict your password length to 16 characters and only allow A-Z, 0-9 and common symbols. :/
Doesn’t the installed “data collection app” on his phone void some of the argument? And in Ecomomic theory, for instance, (Theory of the second best) if ONE assumption is false then ALL the derived resuls are also false.
I’m also desparately trying to remember what information is in the clear text parts of SMTP headers. I do remember that sendmail only logged sender and recipient data and not subjects. Maybe ISP anti-spam apps log a lot more now.
Also, how do they prove that he actually read any of the bulk mailings. I use mailwasher to eye-scan my incoming e-mail that has got past my ISPs anti-spam filters and delete a lot without reading more than sender and subject. My computer may have read the e-mail and created a meta data log entry, but I didn’t read the e-mail.
Not really, GCHQ/NSA tap data cables so can gather the metadata without having to have access to the device.
SMTP headers include sender, recipient, sending mail host, subject, date etc.
They wouldn’t need to prove anything about whether he read the bulk mail, they are using it to infer that he is e.g. a recent graduate.
Metadata is just as easily collected the old way … by observation, but it takes men and resources to achieve it, and there will be gaps in the data. There has been a stunning intervention in Australia in the last few days where it is obvious that Metadata monitoring has been used. The word monitoring is the operative word here. Law enforcement do not collect Metadata … they monitor it. The data streams to the monitoring software is like a river is to a Trout or Salmon, the Salmon always can smell its home stream. So too can monitoring applications … no great revelation about XKeyexplorer lately, its been around for a while. What xkeyexplorer does is sniff the data stream for patterns, patterns that make up words, phrases … like bomb and Donald Duck … assuming Donald was the President. In many ways it is like a traffic cop parked down the road monitoring your speed, if you are not speeding he pays you no mind, if you are he scales up his response … so to does the data sniffer … if you come to its attention then it will take more notice of your device … get really interesting it will start recording your metadata for analysis. If it becomes apparent that you are not a threat it scales down its response. Is it ethical or not is the debate … I hold the old fashioned view that if you have nothing to hide, you have nothing to fear … I know for a fact that this comment will scale up the sniffers interest in me … but I am OK with that, in fact I am glad its there to Serve and protect myself and my family.
Absolutely agree with the above comments…all this scaremongering about the naughty governments monitoring Metadata is misinformed (but hey…. it sells the news).
As an example, if any UK Police Force/Agency wants to get more detailed Metadata (from either the US or UK sources…or any country for that matter)…then they need a Secretary of State Warrant.
An analyst can view metadata to see that a suspect (criminal, terrorist, paedophile, money launderer, etc… ) accessed Google BUT NOT the question asked of it;….. likewise the analyst can see Amazon was accessed BUT NOT what was purchased, also that Facebook/Twitter was used to communicate BUT NOT what was in the conversation/text message ….a Secretary of State Warrant is needed (why a Secretary of State?….because the servers of Google, Amazon, Facebook etc… are housed outside the UK and therefore are classed as international communications).
Even for those servers housed in the UK and therefore not deemed International communications, a UK warrant is still needed in accordance with the Regulation of Investigatory Powers Act 2000 (RIPA)….and too get one of those, the Agency/Force needs to show it is proportional and the information cannot be gathered via any other non intrusive means and that it will be actioned upon/used.
Here’s a thought…. do we realise that paedophiles, terrorists, drug criminals, etc…. communicate via e-mail, Facebook, Twitter, smartphones, etc… and they are all mixed up with everyone else’s Metadata.
All together now….”Bulk Access to Metadata DOES NOT mean Mass Surveillance of the population” (but then again…. this statement doesn’t sell the news)
Oh and well done Apple…you have made it much harder/impossible for legitimate law enforcement to get data (which they are legally authorised to do) on the criminals of the world…. Apple Iphone the paedophiles & terrorists smartphone of choice!!!!!