Sonantic has figured out how to use AI to turn written words into spoken dialogue in a script, and it can infuse those words with the proper emotion.
And it turns out this is a pretty good way to prototype the audio storytelling in triple-A video games. That’s why the Sonantic technology is finding use with 200 different video game companies for audio engineering.
The AI can provide true emotional depth to the words, conveying complex human emotions from fear and sadness to joy and surprise. The breakthrough advancement revolutionizes audio engineering capabilities for gaming and film studios, culminating in hyper-realistic, emotionally expressive and controllable artificial voices.
“Our first pilots were for triple-A companies, and then when we started building this,” said cofounder Zeena Qureshi in an interview with GamesBeat. “We went a lot more vertical and deeper into just working very closely with these types of partners. And what we found is the highest quality bar is for these studios. And so it’s really helped us bring our technology into a very great place.”
Building upon the existing framework of text-to-speech, London-based Sonantic’s approach is what differentiates a standard robotic voice from one that sounds genuinely human. Creating that “believability” factor is at the core of Sonantic’s voice platform, which captures the nuances of the human voice.
Obsidian Entertainment audio director Justin Bell said in a video that the tech will enable game companies such as his own to cut production timelines and costs with this new capability. Bell said that his team could send a script through Sonantic’s application programming interface (API) and then get something back that isn’t just robotic dialogue. It comes back as a real human conversation, and Bell said that could empower the team to tell a better story.
“It’s just really useful hearing something back very early in the process,” Qureshi said.
You could simply use these scripts and the voices generated to populate dialogue into the non-player characters of a game. But the point of this isn’t to put voice actors out of work, Qureshi said. Rather, it gives a readable, reviewable script to the creators much earlier in the creative process so that they can listen to the dialogue and change it much earlier in the process if it clearly doesn’t sound right, she said.
In order to demonstrate its voice-on-demand technology, Sonantic has released a demo video highlighting its partnership with Obsidian, maker of The Outer Worlds and a subsidiary of Microsoft’s Xbox Game Studios. Others using Sonantic include Splash Damage and Sumo Digital.
Sonantic partners with experienced actors to create voice models. Clients can choose from existing voice models or work with Sonantic to build custom voices for unique characters. Project scripts are then uploaded to Sonantic’s platform, where a client’s audio team can choose from a variety of high-fidelity speech synthesis options including pitch, pacing, projection, and an array of emotions.
Film and game studios are not the only beneficiaries of Sonantic’s platform. Actors can maximize both their time and talent by turning their voices into a scalable asset, as the Sonantic technology takes their voices and uses them to create different variations. Sonantic’s revenue share model empowers actors to generate passive income every time his or her voice model is used for a client’s project, spanning development, preproduction, production, and post-production.
“This technology isn’t made to replace actors,” Qureshi said. “What it actually helps with is at the very beginning of game development. Triple-A games can take up to 10 years to make. But they typically get in actors at the very early stages, because they’re constantly iterating. So they use text-to-speech that’s been an industry standard for the last few decades. But we’ve created a way that helps actors work virtually as well as in person. And it helps studios get voices into their game, highly realistic voices into their game from the very beginning to help them feel out the story arc, fill out the pacing, really understand what needs to change, so that their iteration cycles can continue to go really fast.”
Sonantic’s official launch follows last year’s beta release, which was captured in a video entitled Faith: The First AI That Can Cry.
The result is a streamlined production process. Teams won’t have to call back actors for reshoots or engage in re-edits of voices as much.
“Some of our studios have told us they save a week of time for their team every month,” Qureshi said.
An accelerator meeting
Qureshi met cofounder John Flynn in 2018. He had a great demo of the technology, and Qureshi had a background in speech and language therapy.
“When I heard his demo, I was like, ‘This is insane!’” Qureshi said. “It sounded better than any text-to-speech I’ve ever heard. And then he told me how he did it. And I thought, ‘This is exactly how I teach children.’”
Before that demo, all the speech-to-text algorithms Qureshi had heard flattened the delivery of the performance, so that it sounded robotic.
“The technology before didn’t captures the highs and lows of the voice,” Flynn said. “I changed it to make it work better by looking for those highs and lows and trying to like get the algorithm to focus on that more.”
Qureshi added, “The devil is in the details with communication. There are so many different ways to say something. So when I’m teaching a child, I have to teach them emotions. I have to teach them how to enunciate very clearly, how to project their voice, really use their voice as an instrument, and control it.”
Flynn said that most of the work of the past few years is to get models to do the same as what Qureshi could do with kids.
“Last year, we had the AI that could cry, with emotion and sadness,” Flynn said. “It’s really about the nuances in speech, that quiver of the voice for sadness, an exertion for anger. We try and model those really deeply. Once you add in those details and layer them on top, you start to get energy and it becomes really realistic.”
Besides games, Sonanctic works for films and TV production. The company has 12 employees, and it has raised $3.5 million to date from investors including AME Cloud Ventures, EQT Ventures, and Krafton Ventures.
Article: Sonantic uses AI to infuse emotion in automated speech for game prototypes