Google keeps tabs on much of your activity, including your browsing history and your location. Now, it turns out that its YouTube service is also reading what’s in your videos, too.
Programmer Austin Burk, who goes by the nickname Sudofox, discovered the issue after discovering a cross-site scripting (XSS) flaw on another site.
In an attempt to responsibly disclose it, he uploaded a video of the exploit to YouTube as an unlisted video so that he could show it to the relevant parties.
The video displayed a URL under his control that he was using to test his XSS exploit. After uploading the video, he checked to ensure that no one had visited the URL, only to find several hits from a user agent called Google-Youtube-Links. A user agent is the calling card that software uses to announce what program it is when it visits a URL. He could come up with only one explanation:
It was then that I realized that during the video, those URLs were visible in the address bar. It seemed that YouTube had run OCR (optical character recognition) across my entire video and decided to crawl the links within.
YouTube offers several classifications to people uploading videos to its site. Unlisted lets anyone view it as long as they have the link, but won’t surface the video in YouTube’s searches or recommendations. Private only allows people to view the video if the uploader specifically invites them.
To be sure that he hadn’t made a mistake, he decided to try an experiment by submitting a private video containing a folder and file on his own domain that doesn’t exist. About five minutes after uploading the video, the nonexistent URL got several hits from the same Google-Youtube-Links user agent.
This spooked Burk, who says that it could cause problems for security researchers disclosing a vulnerability. One scenario he suggests is a security researcher using a private YouTube video to disclose an SQL injection vulnerability. The researcher might use the video to disclose the malicious URL, but neither the researcher nor the company they were disclosing to would want to visit it, because it would trigger the attack, perhaps dropping tables. However, if YouTube scraped the URL in the video and visited it, it could trigger the flaw, he suggests.
We sympathize with Burk, who expressed his concerns thus:
Honestly, I find this rather unsettling – especially for using private or unlisted YouTube videos as a way to quickly upload a video to disclose a vulnerability. I’m sure you can think of other scenarios in which this would be undesired, especially as we don’t know why it’s taking place or where those URLs will end up.
It would be nice if Google didn’t do this sort of video URL scraping, but if you’re a security researcher planning to send a responsible disclosure report privately to a vendor… well, I’m simply not convinced that YouTube is a suitable messaging medium – even if the vendor is Google itself.
Nevertheless, this news uncovers another little-known detail about how YouTube handles your data.
How many more unknowns are there, and what are their implications? Does YouTube know the vehicle license plate that showed up in your last video, or the slogan that was on your T-shirt? What about other video sites?
We just don’t know, and that’s pretty creepy.
On the other hand, if Google doesn’t do due diligence on the videos that people upload, it leaves itself open to accusations that it isn’t protecting people adequately from illicit, offensive content online. That due diligence could easily include reading and checking out URLs in videos.
In any case, this serves as a reminder not to upload any private information to a free public site, even if your posting is set to private.
12 comments on “YouTube is reading text in users’ videos”
Not wise on YouTube’s part … What if the video shows the URL of a place that should be avoided at all cost for security reasons? Haven’t they heard of the good old “don’t click on the link if you don’t know who is sending it”? They could put their own servers at risk.
Not really, you can pretty much make Google look up any page you want, Google is not going to get a virus out of that
I think you’ll find that’s why they’re doing it, with inspection tools to ensure people aren’t being phished/directed to malware.
Next time, record the video to your hard drive, upload it to a Jottacloud account and share a link to the video with youself. When someone visits that link it will open in the Jottacloud player.
Jottacloud are bound by Norwegian law and cannot do things like this.
Just because you shouldn’t doesn’t mean you can’t though!
Spooky. But with governments pressuring social media companies to perform the nearly impossible task of curating all submissions in order to eliminate extremist content “or else”, this shouldn’t really come as a surprise.
I guess the moral of the story is (still) don’t blindly assume your activities aren’t being monitored. It’s interesting that even security researchers fall into that trap. However, I will say that the extent of the monitoring and the abilities of the monitors continually amazes me.
personally I think the OCR thing and reading the text is necessary to filter and ban according to YT policy. Now we don’t know if the data extracted is also being used for other purposes.
Just my opinions.
I noticed this about a year ago when searching Google for something obscure. There was nothing on the youtube page with the text, but the search term was visible in the video.
Give YouTube some credit. At least they identified themselves by the User-Agent. They could have used something like “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36” which is how the Chrome browser identifies itself.
Of course, Burk could have identified them by their IP address in that case.
YouTube is Google. No surprise to me.
Just goes to show why you don’t want to video record without permission from those you are recording. And video recording sessions need to be planned out to avoid giving away stuff like tickets, credit card numbers, id badge info, passwords, etc. Same goes for photos.