Choosing a Voice AI Library in 2026: What's Actually Worth Building On

You need voice in your application. Maybe text-to-speech, maybe speech recognition, maybe a full voice agent. You search GitHub, find dozens of projects with thousands of stars, and now you need to decide which one to actually build on.

PT-Edge tracks 9,679 voice AI repositories and scores them daily on maintenance, adoption, maturity, and community. This guide uses that data to help you make the decision: what to use, what to avoid, and where the space is heading.

The TTS decision: five real options in 2026

Text-to-speech has the most activity in voice AI — and the most confusion. Hundreds of repos, but the actual decision comes down to what you're optimising for.

If you need production quality and don't mind paying: ElevenLabs

elevenlabs-python (92/100, 2,887 stars) is the dominant API client. It's well-maintained (8 commits in the last 30 days), actively developed, and the voice quality is the industry benchmark. The trade-off is cost and vendor lock-in. If you're building a commercial product and voice quality is the priority, this is the safe bet.

If you need free and good-enough: the Edge TTS ecosystem

Microsoft's Edge TTS API offers surprisingly high-quality voice synthesis at zero cost, and an entire wrapper ecosystem has grown around it.

Project	Score	Stars	What it does
edge-tts	76/100	10,304	Use Microsoft Edge's online text-to-speech service from Python WITHOUT...
edge-tts-universal	54/100	59	Use Microsoft Edge's online text-to-speech service in Node.js, browsers, or...
pyvideotrans	70/100	16,496	Translate the video from one language to another and embed dubbing & subtitles.
Microsoft-Edge-TTS-Simple-Gui	11/100	0	The dynamic TTS preview and management tool, the Microsoft Edge TTS...

edge-tts (76/100, 10,304 stars) is the foundation — a Python library that wraps the Edge TTS service. Around it, developers have built GUI wrappers, cross-platform integrations, and tools like pyvideotrans (16,496 stars) that use Edge TTS as their voice backend for video translation.

The risk: this entire ecosystem depends on an unofficial API. Microsoft hasn't shut it down, and the volume of projects building on it suggests they're aware and tolerating it. But there's no guarantee it stays free or available.

If you need to run locally: sherpa-onnx or mlx-audio

On-device inference is where voice AI is moving fastest. Two projects stand out:

sherpa-onnx (91/100, 10,885 stars) runs TTS and ASR on everything from phones to Raspberry Pis via ONNX Runtime. It's the deployment play — when you need voice processing without a network connection or a cloud bill. 138 commits in the last 30 days shows serious active development.

mlx-audio (93/100, 6,227 stars) is Apple Silicon-native audio built on MLX. If you're targeting Macs or building developer tools that run locally, this is the emerging choice. It's newer than sherpa-onnx but quality score is already high and climbing.

If you're doing research: ESPnet

ESPnet scores 96/100 — the highest in the entire voice AI domain. It's a full end-to-end speech processing toolkit covering TTS, ASR, speech translation, speech enhancement, and speaker diarisation. With 9,768 stars and 109 commits in the last 30 days, it's the project with the most sustained investment behind it.

The trade-off: ESPnet is built for researchers and assumes you know what you're doing. It's not the right choice if you want a TTS library you can pip install and call in three lines. But if you're training models or benchmarking architectures, nothing else comes close to the breadth of supported tasks and recipes.

What about Coqui TTS?

If you search "best open source TTS", every list still recommends Coqui TTS. It has 44,801 stars. But Coqui the company shut down in 2024, and the repo hasn't been meaningfully updated since. Our maintenance score reflects this: the overall quality has dropped because nobody is fixing bugs, reviewing PRs, or updating dependencies.

This is the pattern to watch for in voice AI. A project can have tens of thousands of stars and still be a bad dependency choice. The question isn't "how popular is it?" but "is someone still working on it?" Mozilla DeepSpeech (26,741 stars) and Mozilla TTS tell the same story — Mozilla wound down its voice ambitions years ago, but the repos still dominate star-ranked search results.

Speech recognition: a settled landscape with one clear winner

ASR is more consolidated than TTS. The Whisper family dominates, and the decision is mostly about which Whisper derivative fits your deployment constraints.

Project	Score	Stars	Best for
whisperX	90/100	20,758	Fast Python transcription with speaker diarisation
whisper.cpp	72/100	47,665	C/C++ inference, edge deployment, minimal dependencies
sherpa-onnx	91/100	10,885	Cross-platform mobile/embedded deployment
espnet	96/100	9,768	Research, model training, benchmarking

WhisperX (90/100) is the practical choice for most Python developers — it wraps Whisper with word-level timestamps and speaker diarisation. whisper.cpp (47,665 stars) is the performance play — pure C/C++ inference from the same developer who built llama.cpp. If you're building a product that needs to run on devices, these two plus sherpa-onnx cover the entire deployment spectrum.

Voice agents: the frontier that's still mostly experiments

Connecting LLMs to voice interfaces is the most exciting and least mature part of the voice AI landscape. We track 324 repos categorised as voice assistants, but the average quality is low — most are demo projects that wire together an STT model, an LLM, and a TTS model into a single script.

The projects worth watching are the ones building infrastructure rather than demos:

Project	Score	Stars	What it does
voice-devtools	30/100	50	Developer tools to debug and build realtime voice agents. Supports multiple models.
autovoiceevals	47/100	83	A self-improving loop for voice AI agents. Uses karpathy's autoresearch as...

voice-devtools from Outspeed is building developer tooling for voice agents — the debugging and testing layer that you need when voice goes from demo to production. autovoiceevals tackles the evaluation problem: how do you measure whether a voice AI system is actually good? These are infrastructure projects, not applications, and infrastructure projects are what turn an ecosystem from "cool demos" into "things you can ship."

Google Search Console confirms the interest: autovoiceevals is our best-positioned page on the entire site at position 3.1 with 23 impressions. People are looking for voice AI evaluation tools and not finding much.

Three trends shaping voice AI right now

1. On-device is becoming real

sherpa-onnx and mlx-audio represent a shift from "voice AI requires a cloud API" to "voice AI runs on your phone." This matters because latency kills voice interactions — a 200ms round trip to a cloud TTS endpoint is the difference between a voice assistant that feels natural and one that feels broken. The projects with the highest quality scores in voice AI are increasingly the ones focused on local inference.

2. The free tier is Edge TTS

The Edge TTS wrapper ecosystem is growing because it solves a real problem: good voice synthesis with zero cost. For prototypes, internal tools, and applications where you need voice but can't justify an API subscription, Edge TTS is becoming the default choice. The risk is platform dependency, but that hasn't slowed adoption.

3. Voice agents need tooling, not more demos

There are hundreds of "voice chatbot" repos. What's missing is the tooling layer: evaluation frameworks, debugging tools, latency monitoring, conversation analytics. The voice agent space won't mature until these infrastructure projects exist and reach quality. Watch the voice agent applications category for projects that build tools rather than demos.

How to use this data

Every project mentioned in this guide has a quality-scored page in our voice AI directory, updated daily. You can:

Browse all 166 voice AI categories to find projects in your specific niche
Check trending voice AI projects to see what's gaining momentum this week
Compare any two projects side by side on maintenance, adoption, maturity, and community scores

Quality scores update daily from live GitHub, PyPI, and npm data. If a project stops being maintained, the score drops. If a project starts gaining real adoption, the score rises. The data does the work so you don't have to manually check each repo's commit history before making a decision.