Real-Time Audio Deepfake Tech Is Here

Real-Time Audio Deepfake Tech Is Here

Early AI deepfakes, while impressive from a technical perspective, were both difficult to create and still not entirely convincing.

The technology has advanced quickly since 2020 or so, however, and has recently cleared a key hurdle: It’s now possible to create convincing real-time audio deepfakes using a combination of publicly available tools and affordable hardware. This is according to a report published by NCC Group, a cybersecurity firm, in September. It outlines a “deepfake vishing” (voice phishing) technique that uses AI to recreate a target’s voice in real-time.

Pablo Alobera, managing security consultant at NCC Group, says the real-time deepfake tool, once trained, can be activated with just the press of a button. “We created a front end, a web page, with a start button. You just click start, and it starts working,” says Alobera.

Real-time Voice Deepfakes Can Impersonate Anyone

NCC Group hasn’t made its real-time voice deepfake tool publicly available, but the company’s research paper includes a sample of the resulting audio. It demonstrates that the real-time deepfake is both convincing and can be activated without discernible latency.

The quality of the input audio used in the demonstration is also rather poor, yet the output still sounds convincing. That means the tool could be used with a wide variety of microphones included in laptops and smartphones.

Audio deepfakes are nothing new, of course. A variety of companies, such as ElevenLabs, provide tools that can create an audio deepfake with just a few minutes of audio.

However, past examples of AI voice deepfakes were not real-time, which could make the deepfake less convincing. Attackers could pre-record deepfaked dialogue, but the victim could easily catch on if the conversation veered from the expected script. Alternatively, an attacker might try to generate the deepfake on the fly, but it would require at least several seconds to generate (and often much longer), leading to obvious delays in the conversation. NCC Group’s real-time deepfake isn’t hampered by these problems.

Alobera says that, with consent from clients, NCC Group used the voice changer alongside other techniques, like caller ID spoofing, to impersonate individuals. “Nearly all times we called, it worked. The target believed we were the person we were impersonating,” says Alobera.

NCC Group’s demonstration is also notable because it doesn’t rely on a third-party service, but instead uses open-source tools and readily available hardware. Though the best performance is achieved with a high-end GPU, the audio deepfake was also tested on a laptop with Nvidia’s RTX A1000. (The A1000 is among the least performant GPUs in Nvidia’s current lineup.) Alobera says the laptop was able to generate a voice deepfake with only a half-second delay.

Real-time Video Deepfakes Aren’t Far Behind

NCC Group’s success in creating a tool for real-time voice…

Read full article: Real-Time Audio Deepfake Tech Is Here

The post “Real-Time Audio Deepfake Tech Is Here” by Matthew S. Smith was published on 10/21/2025 by spectrum.ieee.org