Skip to main content
The <Stream> element streams raw audio from active calls over a WebSocket connection in near real-time. Use it for real-time speech processing, transcription, or AI voice applications.

Basic Usage

<Response>
    <Stream>wss://yourserver.example.com/audiostream</Stream>
</Response>
from plivo import plivoxml

response = plivoxml.ResponseElement()
response.add(plivoxml.StreamElement('wss://yourserver.example.com/audiostream'))
print(response.to_string())

Attributes

AttributeTypeDefaultDescription
bidirectionalbooleanfalseEnable two-way audio (read/write)
audioTrackstringinboundWhich audio to stream: inbound, outbound, both
streamTimeoutinteger86400Max stream duration in seconds
contentTypestringaudio/x-l16;rate=8000Audio codec and sample rate
keepCallAlivebooleanfalseContinue call only after stream ends
extraHeadersstring-Custom key-value pairs for WebSocket
statusCallbackUrlURL-URL for stream status events
statusCallbackMethodstringPOSTHTTP method for callback
noiseCancellationstring"false"Enable noise cancellation: "true" or "false"
noiseCancellationLevelinteger85Noise reduction intensity (60100). Only applies when noiseCancellation is "true"

Audio Formats

Content TypeDescription
audio/x-l16;rate=8000Linear PCM, 8kHz (default)
audio/x-l16;rate=16000Linear PCM, 16kHz
audio/x-mulaw;rate=8000G.711 mu-law, 8kHz

Bidirectional Streaming

Enable two-way audio for voice AI applications:
<Response>
    <Stream bidirectional="true" keepCallAlive="true">
        wss://ai.example.com/voice-agent
    </Stream>
</Response>
When bidirectional="true", your WebSocket server can send audio back:
{
    "event": "playAudio",
    "media": {
        "contentType": "audio/x-l16",
        "sampleRate": "8000",
        "payload": "<base64-encoded-audio>"
    }
}
When bidirectional is true, audioTrack cannot be outbound or both.

Stream Both Directions

Capture audio from both parties:
<Response>
    <Stream audioTrack="both" streamTimeout="3600">
        wss://transcription.example.com/stream
    </Stream>
    <Speak>This call is being transcribed for quality purposes.</Speak>
</Response>

Status Callbacks

Monitor stream connection status:
<Response>
    <Stream
        statusCallbackUrl="https://example.com/stream-status/"
        statusCallbackMethod="POST">
        wss://yourserver.example.com/audiostream
    </Stream>
</Response>

Callback Events

Notifications sent when:
  • Audio stream is connected
  • Audio stream is stopped (intentionally or timeout)
  • Audio stream failed or disconnected

Callback Parameters

ParameterDescription
bidirectionalWhether stream is bidirectional
audioTrackWhich audio tracks are streamed
streamTimeoutMax stream duration
contentTypeAudio codec used
extraHeadersCustom headers sent
keepCallAliveWhether call waits for stream

Custom Headers

Pass metadata to your WebSocket server:
<Response>
    <Stream extraHeaders="userId=12345,sessionId=abc123">
        wss://yourserver.example.com/audiostream
    </Stream>
</Response>
Constraints:
  • Max length: 512 bytes
  • Allowed characters: [A-Z], [a-z], [0-9]

Keep Call Alive

Wait for stream to end before continuing:
<Response>
    <Stream keepCallAlive="true">
        wss://ai.example.com/conversation
    </Stream>
    <Speak>Thank you for using our AI assistant.</Speak>
</Response>
When keepCallAlive="true":
  • Stream element runs exclusively
  • Subsequent XML executes only after stream disconnects

Noise Cancellation

Filter out background noise in real-time to improve voice clarity and transcription accuracy for voice agent applications in noisy environments.
Noise cancellation is an account-level feature. Contact your Plivo account manager or email support@plivo.com to enable it before using these attributes.
<Response>
    <Stream bidirectional="true"
            keepCallAlive="true"
            noiseCancellation="true"
            noiseCancellationLevel="85">
        wss://ai.example.com/voice-agent
    </Stream>
</Response>
Choosing a cancellation level:
Level RangeEnvironmentNotes
6070Quiet (home, office)Light filtering, preserves voice detail
7085Moderate noiseGood balance for most use cases (default: 85)
85100Heavy noise (traffic, crowds)Aggressive filtering, may introduce minor artifacts
Start with the default value of 85. Increase toward 100 for heavy background noise. Decrease toward 60 if you notice audio artifacts or voice distortion.

Use Cases

ScenarioConfiguration
Real-time transcriptionaudioTrack="both", contentType="audio/x-l16;rate=16000"
Voice AI agentbidirectional="true", keepCallAlive="true"
Voice AI in noisy environmentsbidirectional="true", keepCallAlive="true", noiseCancellation="true"
Call monitoringaudioTrack="inbound"
Quality analysisaudioTrack="both"

WebSocket Events

Your WebSocket server receives:
EventDescription
ConnectionInitial metadata about the stream and call
MediaBase64-encoded audio chunks with contentType, sampleRate, payload
StopNotification when stream ends
For detailed event protocol, see Stream Event Protocol.