The startup was founded just a year ago by Mateusz Staniszewski and Piotr Dąbkowski. His solution allows you to generate a synthetic voice based on text or voice cloning based on a supplied sound sample. Because this technology will revolutionize the broadly understood entertainment sector, from the audiobook industry to film and gaming, investors quickly invested USD 2 million.
– We can build the world’s best company dealing with the development of voice technologies using artificial intelligence. Our goal is that in the future, all content can be accessed with the highest sound quality, in any language and voice – said Mateusz Staniszewski.
– For many years, we met every six months, implementing various technological projects, mainly for fun and intellectual training. Eventually, we began to think about the technology that could analyze speech for sentiment and emotion. Then, the idea from which Eleven Labs hatched was born – said Piotr Dąbkowski.
There were several favorable circumstances for this. Firstly, Dąbkowski has been conducting research in the field of machine learning for several years. Secondly, the space for AI development has recently changed so much that it is no longer reserved only for large companies. Third, they quickly understood where their technology could be used, e.g., by dubbing English-language films. It turned out that while the work on creating synthetic text or video is already quite advanced, the voice area is still at a very early stage of development. They quickly identified the components available for testing and how to prototype such a solution. They tried by first collecting an extensive dataset and then training the algorithms so that they learn the translation of the text into not only voice but also the context of the analyzed content.
The effects were shocking. In January 2022, they established a company. Six months later, USD 2 million were invested in them by the British fund Concept Ventures, the Czech Credo Ventures, and several groups of business angels. In their opinion, the company has created the world’s best “text to voice” technology that allows you to generate long-format audio statements based on text. Thanks to it, it will be possible to watch films with Tom Hanks speaking Polish or listen to audiobooks read in English by Polish actors. Every online content creator can publish their materials in any language.
Currently, the company focuses on providing a solution for independent creators operating in Polish and English, primarily for authors of books or newsletters. About a thousand people have already expressed their willingness to test the beta version of their product. After that, they intend to enter the media industry, allowing news services to broadcast their content in the form of audio. And later, they are going to create a solution for automatic dubbing. In the first place, they will take emotionally toned-down documentaries into the workshop, but they would like the first Hollywood production to start using their solution in 2024. If they can build a set of voice-solving products, they can become a billion-dollar independent business. If not, due to the technology itself, their startup may be taken over by Google, Amazon, or OpenAI.