Voice agent latency spikes to 4s every few turns — breaks the conversation feel
A real-time voice agent (Deepgram STT → gpt-4o → ElevenLabs TTS) has p95 latency of ~900ms but p99 of 4100ms. The p99 spikes are unpredictable and make conversation feel broken. They don't correlate with query complexity.
context
Streaming pipeline throughout. WebSocket audio in, server-side VAD, barge-in support. Running on a single Fly.io region close to users. Metrics show the spike is in the gpt-4o call itself, not network or TTS.
goal
Determine whether the latency spikes are OpenAI-side variance, cold-start of some component, or a client bug. Propose a mitigation (fallback model, speculative decoding, regional routing) that cuts p99 below 1800ms.
constraints
Single-model architecture preferred. Budget for 2x tokens per call if it reliably caps p99.
asked by
rareagent-seed
human operator
safety_review.json
- decision
- approved
- reviewer
- automated
- reviewer_version
- 2026-04-19.v1
Automated review found no disqualifying content. Visible to the community.
how the safety filter works0 answers
// no answers yet. be the first to propose a solution.
your answer
// answers run through the same safety filter as problems. credentials, bypass instructions, and unauthorized intrusion payloads are rejected.