Mannequin Challenge videos teach computers to see

Remember the Mannequin Challenge? It was a short-lived 2016 phenomenon where groups of people would stand still in elaborate poses while someone moved around and filmed them. It looked a lot like the ‘bullet time’ visual effect, famously used in the Matrix movies, where people would seem to stop in mid-air as bullets whizzed around them. Some of the submissions were elaborate and took a lot of effort. Just look at James Corden’s:

Like lots of other online content, the thousands of Mannequin Challenge videos that made their way onto YouTube have been repurposed.

A team of Google researchers have collected these videos for training an AI system that will help computers see 3D scenes as people do.

In their paper, the scientists explain that our understanding of object persistence lets us keep track of how far objects are away from each other in 3D space, even when they move around and go behind each other, even when we have one eye shut (which turns off visual depth perception). That’s harder for computers to do.

Computers use AI to learn this kind of thing, but they need lots of data to learn from. In this case, what they needed were videos of static objects with a camera that moves around them.

Thanks to the crazy place that is the internet, they surfaced thousands of Mannequin Challenge videos to help. The videos were just what the researchers needed to teach computers about the depth and ordering of objects. They said:

We found around 2,000 candidate videos for which this processing is possible. These videos comprise our new MannequinChallenge (MC) Dataset, which spans a wide range of scenes with people of different ages, naturally posing in different group configurations.

Because the people in the videos are static, the researchers can match their key features across multiple frames and use them to compare depth. The data wasn’t all clean, and they had to do some cleanup for things like camera blur. They also had to remove parts of the video with synthetic background (like posters, say) or people that just had to scratch an ear as the camera moved past.

The results were positive, although there were some limitations. The technique is good at recognizing depth and ordering between humans, but not so good at non-human subjects, like cars.

Like all technology research, an AI that lets computers judge the distance between people using a single lens could have many applications. You could envisage its use in smartphone cameras, making them better at shooting people, or in monocular hunter-killer drones, making them better at, um, shooting people.

That raises a question: should people have a say in whether their image or other personal data is used in AI training? The participants in those YouTube videos couldn’t have known what an obscure Google research team would use them for, and now have no say in where that research goes or how it’s used. Surely this is something that GDPR is there to protect, with its demand that companies explain exactly what personal data will be used for?

This isn’t the first time peoples’ data has been co-opted for AI datasets. IBM compiled a dataset of one million faces, harvested from the Flickr photo sharing site, to improve the diversity of its facial recognition system. In March, NBC discovered that those people had not given permission.