OpenAI Realtime (Speech-to-Speech)

Build a voice agent using OpenAI’s GPT-4o Realtime API for native speech-to-speech processing. OpenAI Realtime processes audio directly without intermediate text conversion, delivering the lowest latency voice conversations. Best for: Applications requiring minimal latency and native multimodal AI capabilities.

How Speech-to-Speech Differs

Standard Pipeline (STT → LLM → TTS):

Audio → Deepgram → Text → OpenAI → Text → Cartesia → Audio

Speech-to-Speech (Direct):

Audio → OpenAI Realtime → Audio

Speech-to-speech models process audio natively, preserving tone, emotion, and context that may be lost in text transcription.

Prerequisites

Service	What You Need
Plivo	Auth ID, Auth Token, Voice-enabled phone number
OpenAI	API key from platform.openai.com with Realtime API access

Installation

pip install "pipecat-ai[openai]"

Environment Variables

# Plivo credentials
PLIVO_AUTH_ID=your_auth_id
PLIVO_AUTH_TOKEN=your_auth_token
PLIVO_PHONE_NUMBER=+1234567890

# OpenAI credentials
OPENAI_API_KEY=sk-your_openai_key

Pipeline Configuration

from pipecat.services.openai import OpenAIRealtimeLLMService

# Speech-to-Speech service
llm = OpenAIRealtimeLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    # voice="alloy",  # Choose voice: alloy, echo, fable, onyx, nova, shimmer
)

OpenAI Realtime Features

Feature	Description
Minimal latency	Direct audio processing for fastest response times
Voice activity detection	Multiple VAD options including semantic-based
Function calling	Seamless integration with external APIs
Multiple voices	Choose from built-in voice personalities
Context management	Advanced conversation flow handling

Available Voices

Voice	Description
`alloy`	Neutral, balanced
`echo`	Warm, friendly
`fable`	Expressive, storytelling
`onyx`	Deep, authoritative
`nova`	Bright, energetic
`shimmer`	Clear, professional

Architecture

With OpenAI Realtime, the pipeline is simplified:

Phone Call ↔ Plivo ↔ WebSocket ↔ Pipecat ↔ OpenAI Realtime

A single service handles:

Speech recognition
Language understanding
Response generation
Voice synthesis

Quick Start

Inbound Calls

git clone https://github.com/pipecat-ai/pipecat-examples.git
cd pipecat-examples/plivo-chatbot/inbound

# Configure environment
cp env.example .env
# Edit .env with Plivo and OpenAI credentials

# Modify bot.py to use OpenAIRealtimeLLMService
# Start server
uv sync && uv run server.py

# Expose with ngrok (development)
ngrok http 7860

Configure your Plivo number’s Answer URL to your ngrok URL.

Outbound Calls

cd pipecat-examples/plivo-chatbot/outbound

cp env.example .env
uv sync && uv run server.py

# Initiate a call
curl -X POST http://localhost:7860/start \
  -H "Content-Type: application/json" \
  -d '{"phone_number": "+1234567890"}'

When to Use OpenAI Realtime

Choose OpenAI Realtime when:

Latency is your top priority
You want the simplest integration
Built-in voices meet your needs
You’re already using OpenAI

Choose standard STT → LLM → TTS when:

You need specific voice characteristics (ElevenLabs cloning, Cartesia emotion)
You want to mix providers for cost optimization
You need fine-grained control over each component

Pipecat Overview - Architecture and setup
OpenAI Realtime Docs - Full configuration
OpenAI Realtime Guide - Official documentation

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

OpenAI Realtime (Speech-to-Speech)

How Speech-to-Speech Differs

Prerequisites

Installation

Environment Variables

Pipeline Configuration

OpenAI Realtime Features

Available Voices

Architecture

Quick Start

Inbound Calls

Outbound Calls

When to Use OpenAI Realtime

Concepts

Integration Guides

API Reference

XML Reference

Troubleshooting

​How Speech-to-Speech Differs

​Prerequisites

​Installation

​Environment Variables

​Pipeline Configuration

​OpenAI Realtime Features

​Available Voices

​Architecture

​Quick Start

​Inbound Calls

​Outbound Calls

​When to Use OpenAI Realtime

​Related

How Speech-to-Speech Differs

Prerequisites

Installation

Environment Variables

Pipeline Configuration

OpenAI Realtime Features

Available Voices

Architecture

Quick Start

Inbound Calls

Outbound Calls

When to Use OpenAI Realtime

Related