New Techniques Emerge to Stop Audio Deepfakes

Voice cloning—in which AI is used to create fake yet realistic-sounding speech—has its benefits, such as generating synthetic voices for people with speech impairments. But the technology also has plenty of malicious uses: Scammers can use AI to clone voices to impersonate someone and swindle individuals or companies out of millions of dollars. Voice cloning can also be used to generate audio deepfakes that spread election disinformation.

To combat the increasing dangers posed by audio deepfakes, the U.S. Federal Trade Commission (FTC) launched its Voice Cloning Challenge. Contestants from both academia and industry were tasked with developing ideas to prevent, monitor, and evaluate voice cloning used for nefarious purposes. The agency announced the contest’s three winners in April. These three teams all approached the problem differently, demonstrating that a multipronged, multidisciplinary approach is required to address the challenging and evolving harms posed by audio deepfakes.

3 Ways to Tackle Audio Deepfakes

One of the winning entries, OriginStory, aims to validate a voice at the source. “We’ve developed a new kind of microphone that verifies the humanness of recorded speech the moment it’s created,” says Visar Berisha, a professor of electrical engineering at Arizona State University who leads the development team along with fellow ASU faculty members Daniel Bliss and Julie Liss.

Visar Berisha records his voice using an OriginStory microphone.Visar Berisha

OriginStory’s custom microphone records acoustic signals just as a conventional microphone does, but it also has built-in sensors to detect and measure biosignals that the body emits as a person speaks, such as heartbeats, lung movements, vocal-cord vibrations, and the movement of the lips, jaw, and tongue. “This verification is attached to the audio as a watermark during the recording process and provides listeners with verifiable information that the speech was human-generated,” Berisha says.

Another winning entry, the aptly named AI Detect, intends to use AI to catch AI. Proposed by OmniSpeech, a company that makes AI-powered speech-processing software, AI Detect would embed machine learning algorithms into devices like phones and earbuds that have limited compute power to distinguish AI-generated voices in real time. “Our goal is to have some sort of identifier when you’re talking on your phone or using a headset, for example, that the entity on the other end may not be a real voice,” says OmniSpeech CEO David Przygoda.

The final winning entry, DeFake, is another AI tool. DeFake adds tiny perturbations to a human voice recording, making precise cloning more difficult. “You can think about the perturbations as small scrambling noises added to a human-voice recording, which AI uses to learn about the signature of a human voice,” says Ning Zhang, an assistant professor of computer science and engineering at Washington University in St. Louis. “Therefore,…

Read full article: New Techniques Emerge to Stop Audio Deepfakes

The post “New Techniques Emerge to Stop Audio Deepfakes” by Rina Diane Caballar was published on 05/30/2024 by spectrum.ieee.org