Actress Scarlett Johansson released a statement this week expressing anger and concern that OpenAI used a voice “eerily similar” to her own as a default voice for ChatGPT.
The voice in question, called Sky, has been available to users since September 2023, but the resemblance to Johansson’s voice became clearer last week when OpenAI demoed an updated model called GPT-4o. Johansson claims that OpenAI’s CEO Sam Altman previously asked her if she would provide her voice for ChatGPT, and she had declined the invitation.
The warm and playful tone of Sky’s voice bears a striking resemblance to the digital companion called Samantha in the film Her (2013), voiced by Johansson. Although Altman has since claimed that Sky’s voice was never meant to resemble Johansson’s, he seemed to allude to this connection by simply tweeting the word “her” on May 13, 2024 — the day that GPT-4o launched.
OpenAI has since explained their process for creating Sky’s voice in a blog post, stating that the voice was provided by “a different professional actress using her own natural speaking voice.” However, as increasingly smaller audio samples can be used to generate synthetic voices, cloning a person’s voice without their consent is easier than ever.
As a sound studies scholar, I’m interested in the ways that AI technology is introducing new questions and concerns about voice and identity. My research situates recent developments, anxieties and aspirations about AI within longer histories of voice and technology.
Stolen voices
This is not the first time a performer has objected to an unlicensed simulation of their voice.
In 1988, Bette Midler pursued legal action against Ford Motor Company for using a voice resembling hers in a series of advertisements. The U.S. Court of Appeals for the Ninth Circuit ultimately ruled in her favour, with Circuit Judge John T. Noonan writing in his decision that “to impersonate her voice is to pirate her identity.”
Tom Waits launched a similar and successful lawsuit against Frito-Lay after hearing what sounded like his own gravelly voice in a radio commercial for Doritos. As musicologist Mark C. Samples describes, this case “elevat[ed] a person’s vocal timbre to the level of his or her visual representation” in the eyes of the law.
Legislators have only just begun to tackle the challenges and risks that accompany the increased adoption of AI.
For example, a recent ruling by the Federal Communications Commission banned robocalls that use AI-generated voices. In the absence of more specific policy and legal frameworks, these examples of voice mimicry continue to act as important precedents.
Chatbots and gender
OpenAI’s apparent reference to the movie Her in the design of Sky’s voice also situates ChatGPT within a long-standing tradition of assigning female voices and personas to computers.
The first chatbot was built in 1966 by MIT professor Joseph Weizenbaum. Called ELIZA, Weizenbaum designed it to communicate with its users in the same manner as a psychotherapist. ELIZA was an influence and reference for today’s digital assistants, which often have feminized voices as their default setting. When first launched in 2011, Siri told stories about ELIZA as if it were a friend.
Many technoscience scholars, including Thao Phan and Heather Woods, have criticized the way tech companies appeal to gender stereotypes in the design of voice assistants.
Communication scholars Jessa Lingel and Kate Crawford suggest that voice assistants invoke the historically feminized role of the secretary, as they undertake both administrative and emotional labour. In referencing this submissive trope, they argue that tech companies seek to distract users from the surveillance and data extraction that voice assistants carry out.
OpenAI says that when casting for ChatGPT’s voices, they sought out “an approachable voice that inspires trust.” It is telling that the voice the company chose to make users feel at ease with rapid advances in AI technology sounds like a woman. Even as the conversational abilities of voice assistants become much more advanced, Sky’s voice demonstrates that the tech industry has yet to move on from these regressive tropes.
Protecting our voices
Johansson’s statement ends with a call for “transparency and the passage of appropriate legislation” to protect vocal likeness and identity. Indeed, it will be interesting to see what legal and policy ramifications might follow from this high-profile case of unauthorized voice simulation.
However, celebrities are not the only ones who should be concerned about how their voices are being used by AI systems. Our voices are already being recorded and used to train AI by platforms like Zoom and Otter.ai and employed in the training of virtual assistants like Alexa.
The illicit AI impersonation of Johansson’s voice might seem like a story from a dystopian future, but it is best understood in the context of ongoing debates about voice, gender and privacy. It’s a sign not of what’s to come, but of what already exists.
The post “ChatGPT’s use of a soundalike Scarlett Johansson reflects a troubling history of gender-stereotyping in technology” by Alex Borkowski, PhD Candidate, Communication & Culture, York University, Canada was published on 05/23/2024 by theconversation.com