Weird and wonderful: the "Webdriver Torso" mystery videos explained - and remystified!

Filed Under: Cryptography, Featured

An esoteric mystery has taken the internet by storm!

If you're not up to speed with the latest in conspiracy theory, you need to know about the eerie YouTube sensation known as Webdriver Torso.

It's a YouTube channel with nearly 80,000 videos, pumped out in batches.

Each one consists of ten peculiar frames, each with two randomly sized and placed rectangles, one blue and one red.

Sometimes you can't see the blue rectangle: it seems that the red rectangle is always drawn second, and occasionally eclipses the blue one entirely.

In the bottom left hand corner is the enigmatic text aqua.flv - Slide 000x, where x runs from zero to nine in each clip.

As each clip appears on the screen, a one-second beep of random pitch can be heard.

Except for the last image, which is cut short.

Here are the frames and tone frequencies from a recent upload to the Webdriver Torso channel:

Listen to the tones from tmpdKHvbS

Who? Why? What for? What does it mean?

The obvious answer, apparently, if you know your Cold War history, is that it's a modern-day Numbers Station.

No-one quite knows what Numbers Stations were for, although it seems pretty obvious.

They pumped out creepy-sounding chains of dispassionately enunciated letters or numbers, sometimes 24 hours a day, from high-powered shortwave transmitters.

There was the Lincolnshire Poacher, named after the musical identifcation signal it used:

Listen to a fragment from the Lincolnshire Poacher

And there was YHF, or Yankee Hotel Foxtrot, with its oft repeated signal identification:

Listen to code groups from Yankee Hotel Foxtrot

Doesn't that make the hairs on your arm stand on end?

The smart money, of course, assumes that Numbers Stations delivered encrypted messages over a public medium for undercover agents of various countries.

Interestingly, it seems that pinpointing a very high-powered shortwave transmitter by triangulation is surprisingly difficult, even (or perhaps especially) if it's in your backyard.

→ Triangulation means taking two directional readings of a distant object - a gun battery, for instance, or the source of a radio signal - from two different, known points and plotting the readings as lines on a map. Since two lines that aren't parallel meet at one and only one point, the intersection of the lines locks in the location of the target. The two known points and two known angles uniquely define a triangle, giving the name triangulation. More than two readings can be used to reduce error and improve precision.

To paraphrase one shortwave afficionado, "Because high-powered shortwave transmissions travel so far, the legs of the triangle are pretty long."

So, zooming in accurately on the end of two of the legs isn't as easy as it sounds.

That just adds to the mystique of Numbers Stations: probably organised and paid for by national governments; mysteriously absent from any official lists of licensed transmitters; hiding in plain sight; and broadcasting spooky content to the entire world with the intention that perhaps only a single person might ever want or need to receive it.

Number Stations in the internet era

Even in today's internet era, shortwave has some handy properties:

  • With the right atmospheric conditions (the radio waves bounce back to earth off the ionospere), a single transmitter can reach most of the earth.
  • Shortwave receivers are easier to acquire and use than computers and modems, and have much more modest power requirements.
  • No routers, repeaters or service providers are needed to get the signal from sender to the receiver.

There is a not-inconsiderable disadvantage, however, compared to the internet:

  • The receiver can't reply by the same medium. (Shortwave transmitters are more complex, expensive and incriminating than a handheld radio or laptop.)

So, if you wanted the internet equivalent of a Numbers Station - an openly visible way of broadcasting clandestine data - then placing abstruse videos in a single, weirdly-named YouTube channel would be a peculiarly bad way of doing it.

It's absurdly inefficient, and (in a repressive country with strict internet logging and filtering) surprisingly easy to detect when someone has been "tuning in."

There are so many other, better, less suspicious places online to stash secret data in plain sight - just think how many apparently unexceptionable websites already pump out otherwise random content, notably in cookies and other tracking codes, that could be put to a dual purpose.

Having said that, here is Sophos Naked Security's very own YouTube Number Station sensation, as a bit of weekend fun.

What does it mean? Do the rectangles carry any data? Is the soundtrack trying to tell you anything?

If you think you know the answers, tell us in the comments. (Use your imagination. We did.)

Do-it-yourself Webdriver Torso videos

If you'd like to make your very own "Webdriver Torso" style video clips, here are some tips on how to do it.

We used the command-line tools Sox, Graphics Magick and FFMPEG, which you should be able to acquire for most operating systems.

(If you can't get Sox to work on Windows, you can use the Generate | Tone menu option in the graphical audio editor Audacity instead.)

To produce a one-second pure sine wave tone, use sox:

$ sox -n tone.wav synth 1 sine 440

To reproduce a pseudo-random sequence, try this:

$ sox -n 1.wav synth 1 sine 1327 
$ sox -n 2.wav synth 1 sine 1172 
$ sox -n 3.wav synth 1 sine 737 
$ sox -n 4.wav synth 1 sine 827 
$ sox -n 5.wav synth 1 sine 1110 
$ sox -n 6.wav synth 1 sine 995 
$ sox -n 7.wav synth 1 sine 592 
$ sox -n 8.wav synth 1 sine 649 
$ sox -n 9.wav synth 1 sine 990 
$ sox -n a.wav synth 1 sine 879 
$ sox -n b.wav synth 1 sine 554 
$ sox -n c.wav synth 1 sine 655 
$ sox -n d.wav synth 1 sine 885 

You can stitch them all together into a 10-second clip with:

$ sox 1.wav 2.wav 3.wav ... c.wav d.wav whatisit.wav

You'll be able to tell if you did it correctly by listening to whatisit.wav. (It isn't random, in fact, or even pseudo-random.)

To produce the random blue/red rectangles, use Graphics Magick.

We chose 640x360 (the resolution at which we received the Webdriver Torso videos), and we wrote the text first, then the blue rectangle, and then the red:

$ gm convert -size '640x360' xc:white \
   -fill blue -draw 'rectangle 75,85,175,95' \
   -fill red  -draw 'rectangle 10,10 50,50' \
   -font 'Courier-Bold' -fill black \
   -draw 'text 10,340 "aqua.flv - Slide 0000"' \
   image0.png

Now assume you have ten images, image0.png to image9.png, and ten one-second sound tones, tone0.wav to tone9.wav.

Knit the tones together into a single WAV file:

$ sox tone[0-9].wav torso.wav

Now knit the images and the combined sound file into a video with ffmpeg:

$ ffmpeg -r 1 -i image%d.png -i torso.wav torso.mov

You can change the extension .mov if you want to produce a different video file type, and alter the output of the final video by using the command line option -s WIDTHxHEIGHT.

Enjoy!

And don't think that, by deconstructing the Webdriver Torso phenomenon into a few simple command lines, we have ruined the mystery.

We've only told you how to make videos that are like the Webdriver ones.

In a way, we've cast the mystery into even sharper relief!

And don't forget to tell us your theories about our very own Number Station Sierra November Sierra in the comments below...

, , , , , , ,

You might like

35 Responses to Weird and wonderful: the "Webdriver Torso" mystery videos explained - and remystified!

  1. Pete · 170 days ago

    What if not all the rectangles are randomly sized? What if some of them actually correspond to a particular set of rectangles that are being searched for throughout the videos at high speed by another computer? What if once it finds a set of rectangles it's looking for, it uses the spoken words and tones to decrypt a pre-selected codekey of set orders/information?

    ...or it's just some [family friendliness intervention] with too much time on his hands clutteringup youtube with mindless drivel.

    Either way. :)

  2. rakso75 · 170 days ago

    Numbers station by Webdriver Torso... numbers... maybe.. Lost (the movie)?

  3. mw · 170 days ago

    [S|ierra]ophos[N|ovember]aked[S|sierra]ecurity

    • Paul Ducklin · 170 days ago

      Got any more evidence to support that theory?

      (I like it, as theories go, but I might be prejudiced...I'd like a bit more than just 'SNS = Sophos Naked Security.' :-)

      • mw · 170 days ago

        nope.... it just jumped at me in the context.... ;) maybe i'll find some reasonable explanation....

      • Mark · 170 days ago

        you just removed [redacted] and then read out [redacted] in phonetics! simples! or [redacted]!

        • Paul Ducklin · 170 days ago

          Indeed we did :-)

          What did you think of the "Torso-like" rectangles? Hidden meaning there? Or a bunch of red (and blue) herrings?

      • The name of the station is obvious and for it's content:
        Security advise, this time not encoded in techtalk, but in numbers.

        • Paul Ducklin · 170 days ago

          OK, so what are these numbers? Convince me it's not aliens! (Of course, smart aliens would be able to write techtalk indistinguishable from ours. But surely then they'd be smart enough to get the content to appear in the article itself, rather than locked up incomprehensibly in a video?

          Unless, in the same way Ford Prefect chose what he thought was an unremarkable name for a human, they misjudged the purpose and remit of Earth's social network services. Ahem. Possible hint: _S_ocial _N_etwork _S_ervices.)

      • Hearth · 168 days ago

        For SNS I get as far as decoding the audio to "SPHS NKDS CRTY" or SoPHoS NaKeD SeCuRiTY sans vowels.

        I got nothing for the boxes ... yet...

        As to Webdriver Torso, could it be a bizarre control station for some botnet, perhaps?

        Either way, the size and shape of the boxes could be irrelevant if, say, an automated system is looking for colours of specific pixels in each frame. Each pixel is then a 3 bit switch that could be automatically read. The audio could be some indicator as to the relevant pixel location, or a vector shift to the next position. Purely speculation of course.

  4. Hi there, I really like this article but I believe there is some evidence that if they are randomly generated, the rule is more complicated than that. First of all, the blue rectangle is always behind the red one. I have seen thousand of them and in none of them I have seen the opposite.

    Secondly, the size of the rectangles is not totally random: if they were, there could be at least a video in which the shapes do not touch at all. Instead, so far in all of them there is at least a slide in which the red rectangle covers partially or totally the blue one, as to confirm that the red ones are always in the front and the blue ones in the back.

    • Paul Ducklin · 170 days ago

      Errrrr, the issue of the blue one being behind the red one - a "rule" as far as can be told by a process of deduction - is covered in the article: "Sometimes you can't see the blue rectangle: it seems that the red rectangle is always drawn second, and occasionally eclipses the blue one entirely."

      (Of course, that means the positioning isn't totally random, at least in *three* dimensions, because the depth is pre-decided, with blue below and red on top every time.)

      As for whether you would expect an overlap in at least one out of ten slides...you need to do some probability calculations to decide whether that's surprising.

      If you work out the probability that there is no overlap in any one slide, you can easily work out the probability that there is no overlap in 10 slides. Gut feelings in probability often lead you astray (see the Birthday Paradox or the Monty Hall Problem) but my stomach is telling me no overlap in 10 slides is unlikely...

  5. kinga · 170 days ago

    Russia and ukraine thing that's going on red, white and blue colours of a certain flag russia to be exact spies getting info sure this channel has being running since September and was discovered in February sure it has being running for a long time(ish)

    • Paul Ducklin · 170 days ago

      Red, white and blue? What about France? The Kingdom of the Netherlands? The United Kingdom of Great Britain and Northern Ireland? Luxembourg? (OK, different sort of blue.) Chile? The United States of America?

      What about Poland? (No blue, but perhaps that's why the red is always on top - sometimes you can see the blue, which might be, say, the European Union, and sometimes it has been eclipsed by the red :-)

      Readers are invited to provide their own list of red/white/blue flags.

  6. DK · 170 days ago

    What if its Skynet? Or fragments of data that eventually add up to viral activity in the right environment? Has any one who has watched all of these checked for unidentified memory residents?

  7. ScottK · 170 days ago

    One of the theories I heard about this rectangle video mystery was that it was a company testing its image recognition software, or to calibrate it to YouTube's compression methods.

    • Paul Ducklin · 170 days ago

      Apparently that has been debunked. The bloke who said he'd seen it before - turned out it wasn't the same (albeit that it was similar) after all.

      Testing Google's real-world transcoder behaviour with such simplistic data (rectangles of solid colour of 0xFF0000 and 0x0000FF plus 10 pure sine waves) would be rather pointless IMO.

  8. Josh · 170 days ago

    There's also Yosemite Sam. He was a radio station in New Mexico that sent out a data packet then a clip of sam from a cartoon.

    Google it. Its strange.

    • Anonymous · 170 days ago

      Not creepy, like Yankee Hotel Foxtrot. But strange indeed! The backstory, if any of it is true, is very cool. As you say...Searchengine it.

  9. Hi!

    I analysed the sound of 20 consecutive videos uploaded 3 weeks ago (from tmpkQwhRW to tmpwxm2CP). I measured the frequency of each beep (so 20x10 beeps in total because each one has 10 beeps).

    I could notice that all beeps are between 500 Hz and 2500 Hz.

    [comment edited for length]

    ...the frequencies are evenly spread (the average distance between two points is not increasing on the right), mean[ing] that the beeps are not musical notes.

    ...no particular chronology appears.

    ...beep successions seem to be completely random too.

    From that analysis, we can deduce 3 possibilities:

    - either the sound of the video is really random, so meaningless and useless
    - or it is a few data hidden among fake beeps.
    - or it is well-encrypted data (data entropy is destroyed so data look random).

    • Paul Ducklin · 170 days ago

      500 Hz to 2500 Hz, eh? (Will fit through the bandpass filter of a landline telephone :-)

      Good analysis, but...

      ...what about Number Station Sierra November Sierra?

  10. Albertus · 170 days ago

    each tone is password to decript steganogrophic video - 1 slide for 1 second = 11 second to 11 pictures ... any tool to automate this ?

    • Paul Ducklin · 170 days ago

      Except there are only 10 images, and the last one only shows up for a fraction of a second. (Listen to the recording above.)

      Also, I'd say it'd be kind of tricky to maintain steganographic data in an image through Google's transcoding process, which is subject to change at any time.

      (Unlessssssssssss....that's what is being tested, of course :-)

  11. Dee B. Pickus · 170 days ago

    Well, it is obviously from Aliens in another galaxy.

  12. Mark · 170 days ago

    Its some kind of Phase Shift keying and similar to Close encounters
    Uses two colours and tones the represent each letter?
    Similar to this common radio mode. However, the text may be encrypted.
    Its all very interesting though! and breaking it down into each tone and uploading it as a video file. how long would it take to send a private message that the NSA couldnt decode?? Or maybe they already have?
    But doing it by youtube instead of pirating a radio frequency?
    Hmmm Interesting.

  13. Albertus · 170 days ago

    the name of video also look like a password ... do you try uncypher video?

  14. rakso75 · 169 days ago

    I took a look at comments in the web about this "mystery" and it really looks like a non-French prankster living in France (probably Paris) having some fun laughing at the world wondering what the heck all this is about...

    If the mystery is solved probably a lot of people would be disappointed, if not, will become one of those big mysteries of mankind (what should teach us a lesson or two about several past mysteries that people are still trying to "decipher")

  15. Hatch · 164 days ago

    We have several potential points of data here. For the images:
    The position of each rectangle, the volume of each rectangle, or the position location of each corner of each rectangle is a finite calculable number that could represent something.
    Also the positional relationships between the squares could be data points.

    The numbers and letters following tmp in the file names: What do they look like in ASCII, Hex, or binary for that matter? And again, the relationship between each digit can be telling...

    The tones seem pretty mundane in and of themselves. Are the tone frequencies exact? For example, 2501 hz vs 2500. The duration and volume don't seem to vary, but nothing is exact if you measure closely enough. Are some of the tones a microsecond longer or shorter than the last? Are some tones a fraction of a db louder or softer than the others? Again, even the relationship from one tone to the next can carry meaning.
    (Maybe it's just me, but the last tone in tmpMuPn41 seems a bit longer.) (Listen to tmpEwSE0U, the tone in slide 003 clearly has some sort of modulation behind it.)

    Also, looking at the most recent posts, they're currently posted about every 20 minutes (3/hr), around the clock. This is clearly an automated process, or a very regimented team.

    I like a good conspiracy theory as much as the next guy. Number's station? Maybe, but I doubt it. It looks more like some sort of automated data collection.

    What's in a name: Webdriver Torso
    First, a Torso is a body with no head, or the center of a body. The imagination can go any number of directions from there. (We all know of at least one famous organization/body with no head.) I think it's likely the body of a process or program. Not the brains or decision making part here, just the data presentation, or data being transmitted from one place to many, in what is likely a non-human readable form.
    We're all familiar with Wardriving. Could this be the output of a bot-net equivalent? There's clearly enough data points here to represent IP addresses and some sort of up/down/or degraded status.

    Just my two cents, or two hours worth.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

About the author

Paul Ducklin is a passionate security proselytiser. (That's like an evangelist, but more so!) He lives and breathes computer security, and would be happy for you to do so, too. Paul won the inaugural AusCERT Director's Award for Individual Excellence in Computer Security in 2009. Follow him on Twitter: @duckblog