AI-Generated Illusions: Navigating the Deepfake Videos and Synthetic Voices
This month (January 2024), the San Francisco Chronicles reported the story of a woman who received a fake phone call allegedly from her son asking for $15,000 bail out money. The scammer cloned the voice of the woman’s son and made a call fabricating a desperate situation. The cases using AI to dupe unsuspecting victims are exploding.
Remarkably, the 1987 movie “The Running Man” foresaw a future where deepfake technology is utilized to deceive the public. The 1992 Michael Crichton novel Rising Sun also had a plot point where security footage was modified to frame an innocent person—although this example relied on the contemporary Photoshop-like digital technology and manual editing.
Now, the future is here. Anyone with a GPU-equipped laptop can make a deepfake video. I am going to discuss two pieces of software. Swapface is an ultra-easy application that lets you deepfake a video or a photo, or even modify your webcam video in real time. Reactor is an open-source software that requires some Python knowledge to set up. But once configured properly, Reactor is also equally easy to use. (Reactor is a successor to Roop, which ceased development.)
Utilizing these tools, I made the following video, putting Sylvester Stallone in an iconic Schwarzenegger role.
I mixed the footage generated by Swapface and Reactor because each had its own pros and cons. Swapface could not really process faces from extreme angles. (Example: when Terminator turns his face fully sideways.) Reactor also introduced distortions in such cases, but a little less than Swapface.
Swapface has its strength: It can handle multiple faces in the same scene and can add more details than Reactor using its “Expert Mode.”
What is remarkable is that they require only a single example image to start.
Faking Audio
ElevenLabs is among the most popular text-to-speech software providers. It also lets you upload voice samples to create a “clone” of the voice. What’s noteworthy about ElevenLabs is that this cloning, or training, happens remarkably quickly. You just need 1 minute worth of audio sample in order to clone the voice and you can start generating new speeches using the cloned voice in seconds.
Play.HT is a lesser-known site. However, as of this writing, I find this to be more feature rich in text-to-voice generation compared to ElevenLabs. I found the samples generated by Play.HT a little closer to the original voice. Play.HT offers 60 languages compared to ElevenLab’s 30. Play.HT also offers more controls over voice generation and more accents. It currently boasts 142 variations in languages and accents.
Here is a deep-faked video that not only has Sylvester Stallone’s face but also his voice cloned using Play.HT. I collected the voice samples using his interviews filmed after the release of the original Rocky film.
Do We Have Any Defense Against DeepFakes?
While there are tools to help detect ChatGPT-generated texts, software is also being developed to identify deepfakes and AI-generated voices. However, their existence raises many troubling questions. Will we see an escalating battle between forgers and defenders? Voice authentication, once embraced by many institutions, now faces trust issues. Can courts of law still accept audio recordings as evidence, relying on AI voice detection software? Understandably, even the US Congress has expressed concern about AI's ability to generate fake media, potentially influencing the 2024 elections
On the other hand, this deep-fake technology has many legit use cases. The image and voice processing technology can be used to produce quality media effects at low cost. When the Disney Plus show “The Mandalorian” introduced a digitally recreated young Luke Skywalker, a Youtube channel called Corridor Crew introduced their own version of the young Luke Skywalker using deepfake technology. Many commentators online opined their version looked superior to the Lucasfilm version.
As we can see, the AI technology has the power to empower both criminals and creators. The world would need a balance between innovation and integrity. Hopefully, we will see more beneficial applications of the technology going forward.