Skip to content

StrawGo — a real-time voice AI framework in Go

Open source

Author · square-key-labs · 2025 – present

My open-source Go framework for building real-time conversational voice agents — a frame-based pipeline that fits far more concurrent calls on a machine than the Python tools it learns from.

Go · Real-time voice AI · WebRTC · Audio · Open source


StrawGo is an open-source framework I’m building for real-time conversational voice AI in Go. If you’ve ever wired up a voice agent in Python, you know the ceiling you eventually hit: at real call concurrency, the runtime starts fighting you. I wanted to see how far a goroutine-native design could push that ceiling.

What it is

A frame-based pipeline where every stage of a call — audio in, noise suppression, voice-activity detection, turn-taking, the model, speech out — is a composable processor, with audio frames flowing between stages over channels. It speaks the common telephony and WebRTC transports and plugs into a wide range of speech-to-text, text-to-speech, and language-model providers.

What’s interesting about it

  • An order of magnitude more throughput per machine, at a fraction of the memory, versus the Python framework it takes inspiration from — which, for a workload measured in concurrent live calls, is the whole game.

  • An in-process audio pipeline (denoise → voice detection → turn detection) running ONNX models directly, rather than shelling out to separate services.

  • Honest benchmarking baked in — including a denoiser shootout where the faster model was deliberately rejected because it hurt voice detection in noisy conditions. (I wrote about that here.)

  • The hard edges done on purpose — a dedicated guard for stale audio, a rate-limited pacer, and a set of strategies for the deceptively tricky question of when the agent has actually finished speaking.

It’s the kind of engineering I find most fun: real-time, resource-bound, and judged by whether a live phone call feels natural.