whisper-timestamped and Whisper-Finetune
The timestamped variant provides the inference capability that the fine-tuning tool enhances, making them complements—one extends Whisper's base transcription output with word-level timing while the other optimizes Whisper through custom training on domain-specific data.
About whisper-timestamped
linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Builds on OpenAI's Whisper by applying Dynamic Time Warping to cross-attention weights for precise word-level timestamp alignment without additional inference steps. Includes optional Voice Activity Detection preprocessing to reduce hallucinations and provides per-word confidence scores alongside segment-level predictions. Designed as a drop-in extension compatible with any Whisper version, with memory-efficient processing for long audio files.
About Whisper-Finetune
yeyupiaoling/Whisper-Finetune
Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment
Implements parameter-efficient fine-tuning using LoRA adapters while maintaining compatibility with OpenAI's base Whisper models across all variants (tiny through large-v3-turbo). Provides dual inference acceleration paths through CTranslate2 and GGML quantization, enabling deployment across heterogeneous platforms without requiring model conversion—original Whisper checkpoints convert directly. Integrates with PyTorch/Transformers ecosystem and includes end-to-end tooling: training pipelines supporting mixed data conditions, evaluation harnesses, and turnkey deployment templates for web services, Windows desktop (native apps), and Android via JNI bindings.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work