speech-swift and mlx-audio-swift
These are complementary tools where the audio processing SDK provides lower-level signal handling and MLX integration that the speech toolkit builds upon for higher-level ASR, TTS, and diarization tasks.
About speech-swift
soniqo/speech-swift
AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML
Provides comprehensive on-device speech pipeline models (ASR, TTS, voice cloning, diarization, VAD, enhancement) optimized for MLX and CoreML, enabling sub-second streaming latency and Neural Engine acceleration on macOS/iOS without external APIs. Bundles curated models from Alibaba, NVIDIA, and others—from lightweight 82M-param TTS to 7B full-duplex speech-to-speech—with quantization profiles (4-bit/8-bit INT, FP16) and pre-compiled CoreML variants sized for on-device constraints. Installable via Homebrew or Swift Package Manager with native Swift bindings for Mac and iOS integration.
About mlx-audio-swift
Blaizzy/mlx-audio-swift
A modular Swift SDK for audio processing with MLX on Apple Silicon
Provides modular audio AI capabilities spanning text-to-speech, speech-to-text, voice activity detection, speaker diarization, and speech enhancement via MLX inference on Apple Silicon. Built as composable Swift packages with streaming support and automatic HuggingFace model loading, it integrates codecs (SNAC, Encodec, Vocos) and supports multiple model families (Qwen3, Fish Audio, Soprano, Voxtral, Sortformer) with native async/await APIs.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work