In the original story of the genie in a bottle from One Thousand and One Nights, the genie threatens to kill the fisherman who freed him – a tale that seems to be resonating with OpenAI, as it continues to pursue advanced voice cloning and synthetic audio and video tools that it says come with major risks.
In a blog post, the company says results of testing show that its Voice Engine is so good at deepfake voice cloning and synthetic audio that it will almost certainly be misused on wide release, prompting the ChatGPT maker to hold back on setting the product loose until it establishes stronger rules and guidelines for deployment.
Developed in 2022, Voice Engine is an update on tech already used in Open AI’s text-to-speech API and the conversation mode of ChatGPT. The blog says Voice Engine “uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. It is notable that a small model with a single 15-second sample can create emotive and realistic voices.” The company has not disclosed the source of the emotionally rich data used to train Voice Engine, but told TechCrunch that the model “was trained on a mix of licensed and publicly available data.”
Beginning, perhaps, to understand the full-scale implications of a free, easily accessible tool that can recreate the realistic voice of anyone from whom it has a 15-second sample, the company says it is now “taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse.”
“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities,” says the blog post. “Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.”
According to a report from ArsTechnica, the current terms and conditions for companies testing Voice Engine prohibit the impersonation of an individual or organization “without consent or legal right.” They mandate clear disclosure of the use of AI to clone voices, and informed consent from anyone whose voice is being cloned. Plus, Open AI uses watermarks to make it easier to identify audio produced using Voice Engine.
Nonetheless, the company makes clear its belief that stopping the generative AI speed train is not an option, and that it is up to society to change with the times. “We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models,” says its post. To start, it suggests phasing out voice authentication as a means of ID verification for banking and other sensitive use cases, increasing public education on AI, “exploring policies to protect the use of individuals’ voices in AI” and accelerating the development of liveness detection, watermarking and other tools to distinguish real voices from synthetic cloned audio.
It is worth noting that OpenAI has rung this particular bell before, likewise warning that its facial recognition software and its text-to-video API Sora are so astonishingly good as to be positioned to transform the world.
Article: OpenAI says its voice cloning tool is too effective for public release