pulijon/Sttcast
Transcription from mp3 files to html with or without embedded player
Uses WhisperX with CUDA acceleration and Pyannote for automatic speaker diarization, processing audio into timestamped transcripts with semantic search via RAG (Retrieval-Augmented Generation) powered by OpenAI embeddings and FAISS vectors. The three-tier architecture separates transcription jobs (port 8000), RAG inference (port 5500), and vector/database queries (port 8001), enabling independent scaling and supporting both GPU-accelerated and CPU-only processing pipelines. Integrates PostgreSQL for web interface state, Flask/FastAPI for multiple frontends, and offers CLI, web UI, and semantic search interfaces for podcast collections.
Stars
25
Forks
5
Language
Jupyter Notebook
License
GPL-3.0
Category
Last pushed
Feb 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/pulijon/Sttcast"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.