Voice is the future of human-computer interaction. I've said this several times recently, but AI voice company ElevenLabs has announced a new product that further underscores the power of conversation in getting things done.
ElevenLabs' conversational AI system is a voice bot that is set up to feel like you're making a phone call, allowing you to converse just like you would with a human.
It is fully customizable, allowing you to select, design, and even clone the voice you want to use. You can also add your own knowledge base. For example, if you are creating a math tutor, you could include access to SAT prep guides.
Most conveniently, you can set up the underlying brain and language models: you can choose from OpenAI, Google, and Anthropic models, or if you are running a company, you can include your own custom models.
Unlike ChatGPT Advanced Voice, it is not native speech synthesis; it works like Gemini Live or MetaAI voice - you speak, it converts it to text and sends it to the AI. The AI responds with text, and Eleven Labs can use existing speech model to convert it into speech. This is done so fast that it may be no different than text-to-speech.
To make this happen, Eleven Labs engineers had to create a new custom speech-to-text model that could transcribe the user's words fast enough to be unobtrusive.
With conversational AI, Eleven Labs is in direct competition with OpenAI's real-time APIs. These are model systems designed to make it easier for companies and organizations to provide voice-based interactions for their products. This could be something less obvious, such as answering calls in a call center or learning about a product.
An example of a use case is children's toys, where models are trained to provide support and feedback in an age-appropriate manner.
Anyone with an Eleven Labs account can create a conversational agent. Four default templates are provided and can be customized at will.
One is a support agent called Eric designed to solve problems, another is a math tutor called Matilda, a third is a travel guide called George with information on most places in the world, and a fourth is a video game wizard with a mysterious voice It's.
I tried it with Life Coach, which can be built from scratch and gives you access to commonly used coaching tools such as habit tracking and goal setting. I use Gemini 1.5 Flash for reasons of speed and price.
Calling an agent costs 500 credits per minute during development. The starter plan gives you 30,000 credits for $4 per month.
Overall, setup is straightforward. There is a lot of flexibility in how you build, and agents appear in the sidebar of your ElevenLabs account. You can also import your Twilio phone number to connect to the voice assistant.
Just for fun, I created a customer support agent named Ryan using a clone of my voice. I'm going to try to see if my father will notice when I give him my phone number and tell him it's my new work number to call if he needs technical help.
Comments