If you’ve ever wanted to know what it might be like to see Kim Jong-un let loose at karaoke, your wish has been granted, thanks to an app that lets users turn photographs of anyone – or anything remotely resembling a face – into uncanny AI-powered videos of them lip syncing famous songs.
The app is called Wombo AI, and while the future of artificial intelligence and the ability to make fake videos of real people strikes fear into the hearts of many experts, some say that Wombo could help by raising awareness of “deepfakes”.
Wombo CEO Ben-Zion Benkhin said he came up with the idea “while smoking a joint with my roommate on the roof”. The app launched in Canada in February and has since been downloaded on Apple’s App store and Google Play more than 2m times.
There are 15 songs users can choose from, including Michael Jackson’s Thriller and the more recent Gunther’s Ding Dong Song. The app’s creators filmed a performer singing each song – and executing specific eye, facial and lip movements, Benkhin told Insider. These background videos help the AI animate any uploaded image.
“I’ve been following the AI space, following the meme space, following the deepfake space, and just saw the opportunity to do something cool,” Benkhin said.
However, Wombo may avoid the potential harm posed by deepfakes, or fake videos of real people that look real. A video created by Wombo “looks realistic but doesn’t look real”, Benkhin told Insider.
Toby Walsh, a professor of artificial intelligence at the University of New South Wales in Sydney, Australia, told the Guardian that while the app was harmless , the increasing availability, ease and realness of technology like this “takes us to a dangerous place”.
However, he said the app’s videos were so clearly fake that they might teach users to mistrust future videos where world leaders, for example, were filmed doing or saying something ridiculous.
Dr Abhinav Dhall, a lecturer at Monash University’s department of human centred computing, said that while it was “comforting” that the quality of Wombo’s videos wasn’t high enough for them to pass as real, social media companies should be tagging the videos as fakes.
There was nothing inherently wrong with computer-generated videos, he said, and because the app was engaging, it might teach people about the existence of deep fakes and how easy it is to create them.
February saw the launch of another AI app, Deep Nostalgia, which uses deep learning to animate old photographs, drawings and even statues, including the much-maligned bronze bust of footballer Cristiano Ronaldo.
Wombo has so far been used to animate Kim Jong-un singing Gloria Gaynor’s I will Survive. A tap and bathroom sink that resemble a face also give an impressive performance of the disco anthem.
Popular videos shared on Twitter include a four-person chorus of past and present US federal reserve chairs – singing Rick Astley’s Never Gonna Give You Up.
Chinese president Xi Jinping has also been shown belting the 80s tune, while People’s Republic of China founder Mao Zedong sings a convincing I Will Survive. US president Joe Biden also puts in an appearance.
The app suggests using photographs where the subject’s teeth are visible, which makes for a more realistic mouth opening and closing effect. A closed-mouth shot can go horribly wrong, as happened when someone tried to make a video of Muammar Gaddafi singing Cascada’s Everytime We Touch. The app mainly animated his neck.
guardian.co.uk © Guardian News & Media Limited 2010