Weird and wonderful: the “Webdriver Torso” mystery videos explained – and remystified!


An esoteric mystery has taken the internet by storm!

If you’re not up to speed with the latest in conspiracy theory, you need to know about the eerie YouTube sensation known as Webdriver Torso.

It’s a YouTube channel with nearly 80,000 videos, pumped out in batches.

Each one consists of ten peculiar frames, each with two randomly sized and placed rectangles, one blue and one red.

Sometimes you can’t see the blue rectangle: it seems that the red rectangle is always drawn second, and occasionally eclipses the blue one entirely.

In the bottom left hand corner is the enigmatic text aqua.flv - Slide 000x, where x runs from zero to nine in each clip.

As each clip appears on the screen, a one-second beep of random pitch can be heard.

Except for the last image, which is cut short.

Here are the frames and tone frequencies from a recent upload to the Webdriver Torso channel:

Listen to the tones from tmpdKHvbS

Who? Why? What for? What does it mean?

The obvious answer, apparently, if you know your Cold War history, is that it’s a modern-day Numbers Station.

No-one quite knows what Numbers Stations were for, although it seems pretty obvious.

They pumped out creepy-sounding chains of dispassionately enunciated letters or numbers, sometimes 24 hours a day, from high-powered shortwave transmitters.

There was the Lincolnshire Poacher, named after the musical identifcation signal it used:

Listen to a fragment from the Lincolnshire Poacher

And there was YHF, or Yankee Hotel Foxtrot, with its oft repeated signal identification:

Listen to code groups from Yankee Hotel Foxtrot

Doesn’t that make the hairs on your arm stand on end?

The smart money, of course, assumes that Numbers Stations delivered encrypted messages over a public medium for undercover agents of various countries.

Interestingly, it seems that pinpointing a very high-powered shortwave transmitter by triangulation is surprisingly difficult, even (or perhaps especially) if it’s in your backyard.

→ Triangulation means taking two directional readings of a distant object – a gun battery, for instance, or the source of a radio signal – from two different, known points and plotting the readings as lines on a map. Since two lines that aren’t parallel meet at one and only one point, the intersection of the lines locks in the location of the target. The two known points and two known angles uniquely define a triangle, giving the name triangulation. More than two readings can be used to reduce error and improve precision.

To paraphrase one shortwave afficionado, “Because high-powered shortwave transmissions travel so far, the legs of the triangle are pretty long.”

So, zooming in accurately on the end of two of the legs isn’t as easy as it sounds.

That just adds to the mystique of Numbers Stations: probably organised and paid for by national governments; mysteriously absent from any official lists of licensed transmitters; hiding in plain sight; and broadcasting spooky content to the entire world with the intention that perhaps only a single person might ever want or need to receive it.

Number Stations in the internet era

Even in today’s internet era, shortwave has some handy properties:

  • With the right atmospheric conditions (the radio waves bounce back to earth off the ionospere), a single transmitter can reach most of the earth.
  • Shortwave receivers are easier to acquire and use than computers and modems, and have much more modest power requirements.
  • No routers, repeaters or service providers are needed to get the signal from sender to the receiver.

There is a not-inconsiderable disadvantage, however, compared to the internet:

  • The receiver can’t reply by the same medium. (Shortwave transmitters are more complex, expensive and incriminating than a handheld radio or laptop.)

So, if you wanted the internet equivalent of a Numbers Station – an openly visible way of broadcasting clandestine data – then placing abstruse videos in a single, weirdly-named YouTube channel would be a peculiarly bad way of doing it.

It’s absurdly inefficient, and (in a repressive country with strict internet logging and filtering) surprisingly easy to detect when someone has been “tuning in.”

There are so many other, better, less suspicious places online to stash secret data in plain sight – just think how many apparently unexceptionable websites already pump out otherwise random content, notably in cookies and other tracking codes, that could be put to a dual purpose.

Having said that, here is Sophos Naked Security’s very own YouTube Number Station sensation, as a bit of weekend fun.

What does it mean? Do the rectangles carry any data? Is the soundtrack trying to tell you anything?

If you think you know the answers, tell us in the comments. (Use your imagination. We did.)

Do-it-yourself Webdriver Torso videos

If you’d like to make your very own “Webdriver Torso” style video clips, here are some tips on how to do it.

We used the command-line tools Sox, Graphics Magick and FFMPEG, which you should be able to acquire for most operating systems.

(If you can’t get Sox to work on Windows, you can use the Generate | Tone menu option in the graphical audio editor Audacity instead.)

To produce a one-second pure sine wave tone, use sox:

$ sox -n tone.wav synth 1 sine 440

To reproduce a pseudo-random sequence, try this:

$ sox -n 1.wav synth 1 sine 1327 
$ sox -n 2.wav synth 1 sine 1172 
$ sox -n 3.wav synth 1 sine 737 
$ sox -n 4.wav synth 1 sine 827 
$ sox -n 5.wav synth 1 sine 1110 
$ sox -n 6.wav synth 1 sine 995 
$ sox -n 7.wav synth 1 sine 592 
$ sox -n 8.wav synth 1 sine 649 
$ sox -n 9.wav synth 1 sine 990 
$ sox -n a.wav synth 1 sine 879 
$ sox -n b.wav synth 1 sine 554 
$ sox -n c.wav synth 1 sine 655 
$ sox -n d.wav synth 1 sine 885 

You can stitch them all together into a 10-second clip with:

$ sox 1.wav 2.wav 3.wav ... c.wav d.wav whatisit.wav

You’ll be able to tell if you did it correctly by listening to whatisit.wav. (It isn’t random, in fact, or even pseudo-random.)

To produce the random blue/red rectangles, use Graphics Magick.

We chose 640×360 (the resolution at which we received the Webdriver Torso videos), and we wrote the text first, then the blue rectangle, and then the red:

$ gm convert -size '640x360' xc:white \
   -fill blue -draw 'rectangle 75,85,175,95' \
   -fill red  -draw 'rectangle 10,10 50,50' \
   -font 'Courier-Bold' -fill black \
   -draw 'text 10,340 "aqua.flv - Slide 0000"' \

Now assume you have ten images, image0.png to image9.png, and ten one-second sound tones, tone0.wav to tone9.wav.

Knit the tones together into a single WAV file:

$ sox tone[0-9].wav torso.wav

Now knit the images and the combined sound file into a video with ffmpeg:

$ ffmpeg -r 1 -i image%d.png -i torso.wav

You can change the extension .mov if you want to produce a different video file type, and alter the output of the final video by using the command line option -s WIDTHxHEIGHT.


And don’t think that, by deconstructing the Webdriver Torso phenomenon into a few simple command lines, we have ruined the mystery.

We’ve only told you how to make videos that are like the Webdriver ones.

In a way, we’ve cast the mystery into even sharper relief!

And don’t forget to tell us your theories about our very own Number Station Sierra November Sierra in the comments below…