YouTube is reading text in users’ videos

Google keeps tabs on much of your activity, including your browsing history and your location. Now, it turns out that its YouTube service is also reading what’s in your videos, too.

Programmer Austin Burk, who goes by the nickname Sudofox, discovered the issue after discovering a cross-site scripting (XSS) flaw on another site.

In an attempt to responsibly disclose it, he uploaded a video of the exploit to YouTube as an unlisted video so that he could show it to the relevant parties.

The video displayed a URL under his control that he was using to test his XSS exploit. After uploading the video, he checked to ensure that no one had visited the URL, only to find several hits from a user agent called Google-Youtube-Links. A user agent is the calling card that software uses to announce what program it is when it visits a URL. He could come up with only one explanation:

It was then that I realized that during the video, those URLs were visible in the address bar. It seemed that YouTube had run OCR (optical character recognition) across my entire video and decided to crawl the links within.

YouTube offers several classifications to people uploading videos to its site. Unlisted lets anyone view it as long as they have the link, but won’t surface the video in YouTube’s searches or recommendations. Private only allows people to view the video if the uploader specifically invites them.

To be sure that he hadn’t made a mistake, he decided to try an experiment by submitting a private video containing a folder and file on his own domain that doesn’t exist. About five minutes after uploading the video, the nonexistent URL got several hits from the same Google-Youtube-Links user agent.

This spooked Burk, who says that it could cause problems for security researchers disclosing a vulnerability. One scenario he suggests is a security researcher using a private YouTube video to disclose an SQL injection vulnerability. The researcher might use the video to disclose the malicious URL, but neither the researcher nor the company they were disclosing to would want to visit it, because it would trigger the attack, perhaps dropping tables. However, if YouTube scraped the URL in the video and visited it, it could trigger the flaw, he suggests.

We sympathize with Burk, who expressed his concerns thus:

Honestly, I find this rather unsettling – especially for using private or unlisted YouTube videos as a way to quickly upload a video to disclose a vulnerability. I’m sure you can think of other scenarios in which this would be undesired, especially as we don’t know why it’s taking place or where those URLs will end up.
But the principles of  responsible disclosure and the nature of YouTube might be mutually exclusive.
As our own Paul Ducklin points out, encrypted communications that you control yourself make much more sense:
 It would be nice if Google didn’t do this sort of video URL scraping, but if you’re a security researcher planning to send a responsible disclosure report privately to a vendor… well, I’m simply not convinced that YouTube is a suitable messaging medium – even if the vendor is Google itself.

Nevertheless, this news uncovers another little-known detail about how YouTube handles your data.

How many more unknowns are there, and what are their implications? Does YouTube know the vehicle license plate that showed up in your last video, or the slogan that was on your T-shirt? What about other video sites?

We just don’t know, and that’s pretty creepy.

On the other hand, if Google doesn’t do due diligence on the videos that people upload, it leaves itself open to accusations that it isn’t protecting people adequately from illicit, offensive content online. That due diligence could easily include reading and checking out URLs in videos.

In any case, this serves as a reminder not to upload any private information to a free public site, even if your posting is set to private.