OpenAI has introduced an AI Voice Engine that clones a voice based on a 15- second sample
01.04.2024 | 16:43 |OpenAI presented the results of preliminary testing of the Voice Engine AI model, which, based on a 15-second sample, can realistically voice the entered text, trying to convey the speaker's voice and speech characteristics as accurately as possible.
The first developments of the Voice Engine appeared at the end of 2022 and were used in the text-to-speech API, as well as in conjunction with ChatGPT Voice and Read Aloud. OpenAI is aware of the consequences of possible misuse of synthesized voice technology, so they hope to receive feedback from society on possible dangers and areas of application. As for the latter, since last year, OpenAI has offered to test the Voice Engine to a small group of partners — and received usage examples:
Helping people and children with reading problems through natural, emotional and diverse voices. For example, Age of Learning, a company specializing in educational technologies, uses the Voice Engine to voice content, as well as, together with GPT-4, personalized responses to students in real time.
Translating content, such as videos and podcasts, will allow authors and companies to expand their audience around the world by communicating information with their own voices and the voices of employees. One of the pioneers was HeyGen, an AI visual storytelling platform for corporate clients designed to create humanoid avatars for various purposes — from product marketing to sales offers. The Voice Engine retains the native accent of the speaker, so when the English text is voiced by the French speaker, the French accent will be heard.
Support for people with speech problems, therapeutic applications, educational supplements. Livox is an AI application for alternative and augmented communication devices that helps people with communication problems. Using the Voice Engine, Livox will offer mute people unique, non-robotic voices. The user can choose the most suitable voice for him, which will speak different languages.
Assistance in voice restoration in case of sudden and degenerative speech disorders. The Norman Prince Neuroscience Institute (NPNI) is conducting a pilot program to help people with oncological or neurological speech disorders. Thanks to the Voice Engine's ability to reproduce speech in a 15-second sample, the institute used an audio recording from a video for a school project and restored the voice of a young patient with speech problems due to a vascular brain tumor.
OpenAI partners have agreed to the rules for using the technology, which prohibit using it to impersonate another person, which cannot be said about possible attackers who are increasingly using neural network technologies for criminal purposes.
ORIENT news