StrawGo — a real-time voice AI framework in Go
Open sourceAuthor · square-key-labs · 2025 – present
My open-source Go framework for building real-time conversational voice agents — a frame-based pipeline that fits far more concurrent calls on a machine than the Python tools it learns from.
StrawGo is an open-source framework I’m building for real-time conversational voice AI in Go. If you’ve ever wired up a voice agent in Python, you know the ceiling you eventually hit: at real call concurrency, the runtime starts fighting you. I wanted to see how far a goroutine-native design could push that ceiling.
What it is
A frame-based pipeline where every stage of a call — audio in, noise suppression, voice-activity detection, turn-taking, the model, speech out — is a composable processor, with audio frames flowing between stages over channels. It speaks the common telephony and WebRTC transports and plugs into a wide range of speech-to-text, text-to-speech, and language-model providers.
What’s interesting about it
-
An order of magnitude more throughput per machine, at a fraction of the memory, versus the Python framework it takes inspiration from — which, for a workload measured in concurrent live calls, is the whole game.
-
An in-process audio pipeline (denoise → voice detection → turn detection) running ONNX models directly, rather than shelling out to separate services.
-
Honest benchmarking baked in — including a denoiser shootout where the faster model was deliberately rejected because it hurt voice detection in noisy conditions. (I wrote about that here.)
-
The hard edges done on purpose — a dedicated guard for stale audio, a rate-limited pacer, and a set of strategies for the deceptively tricky question of when the agent has actually finished speaking.
It’s the kind of engineering I find most fun: real-time, resource-bound, and judged by whether a live phone call feels natural.