shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

/ 100

Emerging

Supports multiple inference backends (Original OpenAI, HuggingFace with FlashAttention2, CTranslate2, TensorRT-LLM) with intelligent audio batching, VAD integration, and dynamic time-length processing to reduce computation overhead. Incorporates hallucination-reduction heuristics and asynchronous large-file loading while simultaneously transcribing batched segments, enabling multi-language and multi-task decoding in a single batch pass.

541 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

541

Forks

Language

Jupyter Notebook

License

MIT

Compare

WhisperS2T and faster-whisper

Higher-rated alternatives

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

oseiskar/autosubsync

Automatically synchronize subtitles with audio using machine learning

FL33TW00D/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

machinelearningZH/audio-transcription

Transcribe any audio or video file. Edit and view your transcripts in a standalone HTML editor.

saharmor/whisper-playground

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

Explore Voice AI Tools

All categories Trending Voice AI directory Insights