Build an AI Voice Agent by Integrating OpenAI's Real-time Speech API with Plivo
Plivo helps businesses leverage OpenAI’s cutting-edge Real-time Speech-to-Speech (S2S) capabilities through seamless integration with Plivo’s Audio Streaming API. This powerful combination enables you to create sophisticated AI voice assistants that can engage in natural conversations, handle interruptions gracefully, and provide real-time responses to user queries.
Get started with Plivo
Before beginning your AI voice assistant development, sign up for Plivo or sign in to your existing account. You’ll need to purchase a voice-enabled number through the Voice API or Plivo console.
Prerequisites
Ensure you have the following before starting:
- Node.js version 22.6.0 or later (download here)
- Python version 3.10.5 or later (download here)
- A Plivo account with a voice-enabled number
- An OpenAI account (sign up here)
- Valid API key
- Access to OpenAI’s Real-time API
- ngrok installed for local development testing
Clone the Plivo audio stream integration guides repository
Setup Your Local Environment
- Create a Tunnel with ngrok For local development, you’ll need a public URL to receive webhooks. Open a terminal and run:
Copy the Forwarding URL (format: https://[your-ngrok-subdomain].ngrok.app
). You’ll need this for the Plivo Answer XML.
Note: The port 5000 is this application’s default. If you change the PORT in index.js (in case of Node) or server.py (in case of Python), update the ngrok command accordingly. Remember that each new ngrok session creates a new URL requiring configuration updates.
- Install Required Packages
If you are using Node.js:
- Configure Environment Variables
Create a .env
file in your project root and set up the following:
Add Plivo Credentials
Add OpenAI API Key
Configure Answer XML
Use this template for your Plivo application’s Answer XML:
Update the PLIVO_ANSWER_XML variable in your .env file with your Answer URL.
Launch Your Application
- Ensure ngrok is running and you’ve noted the Forwarding URL
- Verify all environment variables are properly configured
- Start the application:
The application will automatically initiate a call to the number specified in PLIVO_TO_NUMBER. Once the call is answered, you can begin interacting with your AI assistant.
Key Features
Your AI voice assistant includes:
- Real-time audio streaming through Plivo’s WebSocket
- Natural voice communication using OpenAI’s Real-time model
- Intelligent interruption handling for natural conversation flow
- Function calling support for enhanced capabilities
- Bi-directional audio streaming for seamless interaction
Troubleshooting Guide
If you encounter issues:
- Check WebSocket Connection:
- Verify ngrok is running
- Confirm the WebSocket URL in your Answer XML matches your ngrok URL
- Check for WebSocket connection errors in your logs
- Verify Environment Setup:
- Confirm all environment variables are correctly set
- Ensure OpenAI API key is valid
- Verify Plivo credentials are correct
- Audio Issues:
- Check audio stream configuration in Answer XML
- Verify audio format compatibility
- Monitor WebSocket data transfer logs
Next Steps
Consider these enhancements for your AI assistant:
- Implement custom conversation flows
- Add specific business logic through function calling
- Create detailed conversation logs
- Add support for multiple languages
- Implement analytics and monitoring
For additional support:
- Visit Plivo Documentation
- Check OpenAI API Documentation
- Contact Plivo Support for technical assistance