pulijon/Sttcast

Transcription from mp3 files to html with or without embedded player

/ 100

Emerging

Uses WhisperX with CUDA acceleration and Pyannote for automatic speaker diarization, processing audio into timestamped transcripts with semantic search via RAG (Retrieval-Augmented Generation) powered by OpenAI embeddings and FAISS vectors. The three-tier architecture separates transcription jobs (port 8000), RAG inference (port 5500), and vector/database queries (port 8001), enabling independent scaling and supporting both GPU-accelerated and CPU-only processing pipelines. Integrates PostgreSQL for web interface state, Flask/FastAPI for multiple frontends, and offers CLI, web UI, and semantic search interfaces for podcast collections.

No Package No Dependents

Maintenance 10 / 25

Adoption 7 / 25

Maturity 9 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

GPL-3.0

Related tools

alias454/YATSEE

YATSEE - Yet Another Tool for Speech Extraction & Enrichment

okamyuji/meeting-transcriber

Japanese meeting transcription & minutes generation app with local ASR (Kotoba Whisper) + LLM...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights