FluidAudio and speech-swift
These are direct competitors offering overlapping functionality (ASR, TTS, VAD, diarization) for on-device speech processing on Apple platforms, with FluidAudio being more mature (higher stars) and speech-swift offering a broader feature set (speech-to-speech) while both emphasizing local ML inference.
About FluidAudio
FluidInference/FluidAudio
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Inference offloads to the Apple Neural Engine (ANE) for minimal CPU/GPU usage and optimized battery performance on always-on workloads. Includes streaming ASR with end-of-utterance detection, inverse text normalization for post-processing, and both online/offline speaker diarization pipelines with advanced clustering. All models are open-source (MIT/Apache 2.0) from HuggingFace, supporting 25 languages for transcription and 9 for TTS, with straightforward Swift integration.
About speech-swift
soniqo/speech-swift
AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML
Provides comprehensive on-device speech pipeline models (ASR, TTS, voice cloning, diarization, VAD, enhancement) optimized for MLX and CoreML, enabling sub-second streaming latency and Neural Engine acceleration on macOS/iOS without external APIs. Bundles curated models from Alibaba, NVIDIA, and others—from lightweight 82M-param TTS to 7B full-duplex speech-to-speech—with quantization profiles (4-bit/8-bit INT, FP16) and pre-compiled CoreML variants sized for on-device constraints. Installable via Homebrew or Swift Package Manager with native Swift bindings for Mac and iOS integration.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work