Let’s say that, mid-oversharing, I thought better about writing a Facebook post about how the rash has now spread to my … (cue the backspacing, the select all/delete, hitting cancel or whatever it takes to avoid telling the world about that itch).
If that text were a Facebook status update (or a Twitter tweet, a Yahoo email, a comment on a blog or any other typing on a web page), cancelling it doesn’t, theoretically, really matter: what I wrote could still have been recorded, even if I decided not to post it.
That’s a point brought up on Friday by Jennifer Golbeck, director of the Human-Computer Interaction Lab and an associate professor at the University of Maryland.
Slate published an article Golbeck wrote up about a paper, titled Self-Censorship on Facebook (PDF), that describes a study conducted by two Facebook researchers: Sauvik Das, a PhD student at Carnegie Mellon and summer software engineer intern at Facebook, and Adam Kramer, a Facebook data scientist.
Over the course of 17 days in July 2012, the two researchers collected self-censorship data from a random sample of about 5 million English-speaking Facebook users in the US or UK.
How did they know when one of the Facebook users under their microscope had decided to back out of a post?
That’s simple as pie, really: they used code they had embedded in the web pages to determine if anything had been typed into the forms in which we compose status updates or comment on people’s posts.
To protect users’ privacy the researchers decided to record “only the presence or absence of text entered, not the keystrokes or content”. A quote that serves as a helpful reminder that they could have tracked your keystrokes if they had wanted to.
(Note: logging keystrokes is no super secret, privacy-sucking vampire sauce. It’s plain old Web 1.0. This is not news, but it’s certainly worth repeating: anybody with a website can capture what you type, as you type it, if they want to.)
The researchers tracked that a user had started writing content only if a Facebook user typed at least five characters into a compose or comment box. If the content wasn’t shared within 10 minutes, it was marked as self-censored.
Why in the world would Facebook, Twitter, or similar care so much about my rash and subsequent decision not to tell the world about it?
While second thoughts come in handy to stop people who might otherwise post truly embarrassing Facebook or other social media content, as far as the social networks themselves are concerned, self-censoring users just starve sites of the content they otherwise feed upon.
From the paper:
Users and their audience could fail to achieve potential social value from not sharing certain content, and the [social network] loses value from the lack of content generation...
... Understanding the conditions under which censorship occurs presents an opportunity to gain further insight into both how users use social media and how to improve [social networks] to better minimize use-cases where present solutions might unknowingly promote value diminishing self-censorship
In her Slate article, Golbeck interprets Facebook’s 17-day collection of self-censorship data for this research to be an invasion of privacy in that, as she writes, “the things you explicitly choose not to share aren’t entirely private.”
The problem with this thinking is that it conflates two things: 1) Facebook’s ability to capture data about users who started typing something but then didn’t publish it, and 2) the incorrect notion that Facebook tracked the content of what users typed.
Could Facebook have captured my need for salve? Absolutely. As I said above, anybody with a website can capture what we type into their website as we type it. It’s the nature of the web.
But the researchers took pains to state that while they did track the presence or absence of text entered, they explicitly did not listen in on the abandoned content; indeed, they tracked neither the keystrokes nor the content entered.
From the paper:
All instrumentation was done on the client side. In other words, the content of self-censored posts and comments was not sent back to Facebook's servers. Only a binary value that content was entered at all.
That said, Facebook was still looking over its users’ shoulders in a fashion that would likely come as an unpleasant surprise to many of them.
Golbeck’s conflation isn’t surprising. Particularly given NSA-gate and the heightened awareness about pervasive surveillance it’s bestowed upon us, we’re ready to see eavesdropping governments and their corporate lackeys lurking in every corner of the internet.
But there’s a yawning gap between what people think can and cannot be monitored and what is actually possible.
The reality is that JavaScript, the language that makes this kind of monitoring possible, is both powerful and ubiquitous.
It’s a fully featured programming language that can be embedded in web pages and all browsers support it. It’s been around almost since the beginning of the web, and the web would be hurting without it, given the things it makes happen.
Among the many features of the language are the abilities to track the position of your cursor, track your keystrokes and call ‘home’ without refreshing the page or making any kind of visual display.
Those aren’t intrinsically bad things. In fact they’re enormously useful. Without those sort of capabilities sites like Facebook and Gmail would be almost unusable, searches wouldn’t auto-suggest and Google Docs wouldn’t save our bacon in the background.
There are countless examples of useful, harmless things this (very old) functionality enables.
But yes, it also provides the foundation for any sufficiently motivated website owner to track more or less everything that happens on their web pages.
This is the same old web we’ve been using since forever but a lot of people don’t realize. When they find out, they’re often horrified.
This was illustrated by a recent news piece about Facebook mulling the tracking of cursor movements (actually, technically, it would be tracking the movement of users’ pointers on the screen) to see which ads we like best.
The comments on that story make clear that many people are utterly creeped out by the idea that websites can track their pointers. One commenter likened pointer tracking to keylogging.
But as Naked Security’s Mark Stockley pointed out in a subsequent comment on that article, none of this is new, and the capability is certainly not confined to Facebook:
If Facebook [wants] to do key logging then [it] can - so long as you're browsing one of their pages they can capture everywhere your cursor goes and everything you type. I'm not saying they do, I've no idea, I'm just saying it's possible - any website can do it and it's very easy.
In fact, as Mark noted in his comment on the pointer-tracking story, if he had decided to ditch the comment he was writing halfway through, the Naked Security site could still have captured everything he typed, even if he’d never hit submit (it didn’t by the way, we don’t do that).
In sum: Facebook spent 17 days tracking abandoned posts in a manner that some might find discomforting and readers are reminded that the internet allows website owners to be far, far more invasive.
If you want to be sure that nobody is tracking your mouse pointer or what you type then you’ll have to turn off JavaScript or use a browser plugin like NoScript that will allow you to choose which scripts you run or which websites you trust.
Image of backspace key courtesy of Shutterstock.
To make it clear, is the article telling me that a web page can track whatever I type into a textbox on the page, or that it can track what I type to another completely unrelated window, e.g. Notepad? I assume the first one, but the article is not really clear on it.
BTW, Google Chrome offers me the option to “Resolve grammar issues using a web-service.” I assume it means sending whatever I type to any textbox on any page to a spell-checking WS somewhere in the cloud. Of course I have the option disabled.
To clarify: websites can ‘only’ track keystrokes and the position of the mouse pointer on their own web pages.
Unless they used JavaScript to open another window (which they probably shouldn’t), in which case I believe they have more access? Even if the window is on a completely different site…? I think that used to be possible / still is possible – not entirely sure.
I think that would violate the same origin policy. However I can’t see why a third party script that’s included in a page voluntarily – something like Google Analytics, Tweet buttons etc couldn’t do this.
thanks for that was also wonderng
It can’t track what you type in an external app, like notepad. Only what you type into the form on the web page.
This has never^H^H^H^H^H happened before.
Actually, I backed out of a Sophos comment just yesterday. I was typing something, then realized I didn’t have the facts to back up what I was typing. So I opened another tab to double-check and couldn’t find anything to back my position, so I abandoned the comment. If someone had secretly spied on that comment while I was typing it, whatever. I don’t think that would have revealed anything useful.
Is there a way to identify if a website is tracking my keystrokes?
Or is all the functionality of JavaScript only available on the websites side?
Everything happens client-side so you are, theoretically, in total control. If you’re feeling really paranoid you can read the code to see what it’s going to do : )
I can’t see why a browser plugin wouldn’t be able to do this. There are plenty of good uses for mouse pointer monitoring and keylogging so you’d want something that asks you if a website is allowed to perform those actions as and when it tries – something like a more fine grained version of what noscript does today.
This sounds like an ideal project for our datababy experiment! Do you reckon a browser plugin could detect if actual content was being sent prior to being deleted?
I’m sure it could, or perhaps a custom proxy server.
I wrote a plugin for the older opera browser that disrupts virtually all methods of tracking, and many exploit vectors. This includes key logging, font enumeration, fingerprint tracking, heap spraying, and many others. I simply scan and scrub the page running it through a large series of regex matching for any css, html or java code that would be used for those types of exploits.
Unfortunately it no longer works with the newer opera. Someday I will rewrite it for the chrome style plugin if possible.
My plugin (which was never released to the public, do to being kind of slow) is the only one I’ve ever come across that does this type of thing. I’ve often wondered why?
Not so much keylogging but more on tracking. I use a firefox plugin (ghostery) which identifys all tracking cookies found on sites and ive the option to block them. Sure i found ten tracking cookies for this site alone. A bit extreme to have so many monitoring users. Why have a topic on tracking when this site is at it themselfs?
Facebook social plugin
Google +1
Google Analytics
Gravator
Linkedin widget
Polldaddy
Quantcast
Reddit
Twitter button
Wordpress stats
Details of the cookies set on Naked Security, what they’re for and how to avoid them if you don’t want them are available on our Cookies and Scripts page.
http://nakedsecurity.sophos.com/cookies-and-scripts/
Firstly, a big Thank You for reminding users about all this monitoring of a website’s visitors.
Of course all of this done to give “the user a ‘better experience’ on our
site”…
I suppose the same goes for reading your History. Possibly your Favourites too.
Everyone has noticed that many (most) sites know your OS. It seems many can’t make the jump to realizing if a site can monitor what OS (video resolution, etc) you are using they also have the ability to monitor other things.
And let’s not even discuss the greater size of the ‘cookie’ file (it seems the minimum size now is 4Kb) or the Super Cookie (Flash or Silverlight) which can hold 20 times as much information and can be shared with other sites.
Some disambiguation:
History and Favourites can’t be read by websites. There have been hacks to get your history (or at least to find out if a specific website is in your history) but I think browsers are on top of them in the main.
Your OS is sent as part of the ‘user-agent’ header in every single HTTP request and has been since forever. Notionally it describes your browser and OS but some get into quite esoteric. IP address + user-agent is often used to identify a ‘unique visitor’ (but not who that visitor is) from web logs for tracking purposes. Since the user-agent header is sent by the browser it’s actually under users’ control and most browsers allow you to change it. There are plugins that will allow you to do it as well.
Screen resolution, window size etc are available through Javascript. This isn’t personally identifiable information and it’s quite useful for website owners. It’s better for everyone if sites are designed to fit the size of windows and monitors users typically use. If you don’t want to share this you need to switch off javascript.
When it comes to cookie size you’re behind the times my friend! HTML5 has a feature called ‘local storage’ which allows websites to store ~10MB of whatever they like on your browser.
Cookies can only be shared where two or more sites use a common 3rd party to set the cookie, you can’t just look at another site’s cookies unbidden. Ghostery or Lightbeam will tell you all about these 3rd party cookies and block them if you want.
Thanks for keeping me up to date. I am only vaguely familiar with HTML5 and new there was local storage. 10 Mb… hmm
As for the History, the only browser I have seen to show me that it explicitly denies access to History is Opera (versions <= 12.16). In opera:config there is a toggle: "Allow Scripts To Navigate In History".
Since moving to Linux a few months ago I am getting used to FF which will delete History on closing. That seems a different story.
Javascript can navigate through the history (go back or forwards any number of steps) but cannot read it.
In HTML5 Javascript can also read and modify the history but this is almost certainly bound by the same origin policy – so javascript on nakedsecurity.sophos.com could only modify the history items relating to nakedsecurity.sophos.com.
Does this function also work with the password form field? I hope not, but if it does, the client sending a hash as authentication is a moot point.
Yes it can. The client doesn’t send a hash, it sends the password in plain text (hopefully over SSL so it can’t be read ‘on-the-wire’) and any hashing, if it’s done at all, is done on the server.