Universal translators in science fiction, such as the Babel fish in The Hitchhiker’s Guide to the Galaxy, have long offered the dream of instantaneous translation from one spoken language to another. Now, in what may be a key step toward making this fantasy a reality, scientists at Facebook’s parent company Meta have developed an AI system that can instantly translate speech and text, including direct speech-to-speech translations, for up to 101 languages.
“Science fiction provides a clear goal that our group can focus on,” says Marta Costa-jussà, a research scientist at Meta’s Fundamental AI Research team in Menlo Park, California. The scientists described their work on 15 January in the journal Nature.
As the world grows more interconnected, people have more access to multilingual content than ever. However, most automated translation systems are designed to only input and output text. Until now, the speech-to-speech machine translation systems that did exist covered significantly fewer languages than text-to-text systems. Moreover, previous speech-to-speech systems were often skewed toward translating a given language into English, rather than English to another language.
Meta’s SeamlessM4T Translation Tech
Now Meta has developed an AI system called SeamlessM4T that can translate speech and text in up to 101 languages. Specifically, it can support speech-to-speech translation for 101 to 36 languages, speech-to-text translation for 101 to 96 languages, text-to-speech translation for 96 to 36 languages, text-to-text translation for 96 languages, and automatic speech recognition for 96 languages. (Whether it can or cannot translate between languages depends on the availability of quality speech data, Costa-jussà says.)
To develop SeamlessM4T, the researchers trained a brain-mimicking neural network AI system on 4 million hours of multilingual audio and tens of billions of sentences from publicly available repositories of web data. They also had it analyze roughly 443,000 hours of audio with matching text—for instance, Internet video clips with subtitles—to further improve the system.
When it came to speech-to-speech translation, the research team found SeamlessM4T’s translations were up to 23 percent more accurate than previous state-of-the-art systems. With speech-to-text tasks, it was 8 percent more accurate than prior systems.
Furthermore, SeamlessM4T was roughly 50 percent more resilient against background noise and variations in how speakers talked when it came to speech-to-text tasks. Moreover, it could translate utterances mixing two or more languages.
Checking for Toxicity and Bias
To reduce the chances that SeamlessM4T might add profanity and other toxic language to its translations, the researchers employed two strategies to remove toxicity during its training and operation. When they compared SeamlessM4T models to the state of the art, they found these approaches reduced toxicity in translations by up to 20…
Read full article: Meta’s New Translation AI Is Nearly a Babel Fish
The post “Meta’s New Translation AI Is Nearly a Babel Fish” by Charles Q. Choi was published on 01/15/2025 by spectrum.ieee.org
Leave a Reply