FluidInference/FluidAudio
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Inference offloads to the Apple Neural Engine (ANE) for minimal CPU/GPU usage and optimized battery performance on always-on workloads. Includes streaming ASR with end-of-utterance detection, inverse text normalization for post-processing, and both online/offline speaker diarization pipelines with advanced clustering. All models are open-source (MIT/Apache 2.0) from HuggingFace, supporting 25 languages for transcription and 9 for TTS, with straightforward Swift integration.
1,689 stars. Actively maintained with 90 commits in the last 30 days.
Stars
1,689
Forks
214
Language
Swift
License
Apache-2.0
Category
Last pushed
Mar 18, 2026
Commits (30d)
90
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/FluidInference/FluidAudio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn...
phuc-nt/my-translator
Real-time speech translation — macOS & Windows, free TTS, no server, your API keys only
pot-app/pot-desktop
🌈一个跨平台的划词翻译和OCR软件 | A cross-platform software for text translation and recognition.
Blaizzy/mlx-audio-swift
A modular Swift SDK for audio processing with MLX on Apple Silicon
soniqo/speech-swift
AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered...