Speech-to-Text (STT) enables Voice AI Agents to convert spoken language into text for processing and generating accurate responses
The Speech-to-Text (STT) feature applies specifically to voice-based agents and controls how the agent listens and transcribes audio input during a call. This is a crucial component for any voice-based agent, enabling it to convert spoken language into text, which can then be processed for responses.
In the AI Voice Settings, you can choose the appropriate speech-to-text provider and language for your agent. These settings directly affect how the agent interprets and processes voice input during calls.
Each provider offers unique performance characteristics, including accuracy, speed, and language support. Depending on your use case, you can select the provider that best meets your needs.
Use Case
Choose your speech-to-text provider based on the complexity of the conversation. If your agent handles basic conversations, a simpler STT model might be sufficient. For more technical or nuanced conversations, consider a model that offers higher accuracy and language support.
Language
Ensure the language selected matches the language of the customer interactions. This will help improve transcription accuracy. Different providers support different sets of languages, so confirm the availability of your required language.
Pricing Consideration
The choice of STT model might impact your pricing. More advanced providers or those supporting specialized languages may incur higher costs. Always evaluate your use case to choose the most cost-effective option without compromising on quality.
Once configured, the STT provider is automatically applied to voice-based agent interactions. The agent listens during the conversation and converts spoken responses into text for further processing. This text can be used for generating responses or pulling information from a knowledge base, allowing the agent to engage effectively.