Expanding on how Voice Engine works and our safety research
Exploring the technology behind our text-to-speech model.
We first developed Voice Engine in late 2022. Early on, to assess the capabilities and limitations of our Voice Engine model, we tested it internally using a mix of public and private voice samples. This internal prototype was essential for our alignment and safety research, informing our safeguards, and is a continuation of our commitment to understand the technical frontier.
Importantly, these outputs were reserved for internal testing, not for training the models that power our products.
As part of our iterative deployment framework, this early prototype also played a valuable role in helping policymakers understand the capabilities of synthetic voice models. For instance, starting last summer we showed global policymakers at the highest levels the technology's potential and discussed the associated risks with them.
In September of 2023, we used Voice Engine to power ChatGPT’s Voice Mode feature. Because these capabilities also presented new risks, we launched it only for this specific use case. Voice Mode was created solely from real voices, carefully selected through a detailed process that began in May 2023 involving professional voice actors, talent agencies, casting directors, and industry advisors.
In November of 2023, we released a simple TTS API(opens in a new window) also powered by Voice Engine. We chose another limited release where we worked with professional voice actors to create 15-second audio samples to power each of the six preset voices in the API. Developers can build these into their websites to read blog posts out loud, for example.
In March of this year, we previewed Voice Engine’s capability of creating custom voices with a small set of trusted partners. This initiative aimed to raise awareness about the capabilities of synthetic voices and support the following goals:
- Phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information
- Exploring policies to protect the use of individuals' voices in AI
- Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content
- Accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it's always clear when you're interacting with a real person or with an AI
These small scale deployments are also helping to inform our approach, safeguards, and thinking on how Voice Engine could be used for good across various industries.