harmlessman/PAFTS
PAFTS : Library That Preprocessing Audio For TTS.
Integrates UVR for vocal/music separation, pyannote-audio for speaker diarization, and OpenAI's Whisper for speech-to-text transcription to create speaker-isolated, noise-cleaned training datasets. The pipeline automatically organizes output into speaker-labeled directories with corresponding JSON transcriptions, enabling end-to-end conversion of raw multi-speaker audio into structured TTS training data. Requires PyTorch (GPU-accelerated), FFmpeg, and HuggingFace authentication for diarization models.
No commits in the last 6 months. Available on PyPI.
Stars
27
Forks
5
Language
Python
License
MIT
Category
Last pushed
Nov 15, 2024
Monthly downloads
33
Commits (30d)
0
Dependencies
25
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/harmlessman/PAFTS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Higher-rated alternatives
KoljaB/RealtimeTTS
Converts text to speech in realtime
nateshmbhat/pyttsx3
Offline Text To Speech synthesis for python
pndurette/gTTS
Python library and CLI tool to interface with Google Translate's text-to-speech API
n1teshy/yapper-tts
offline text to speech and free SOTA LLM APIs to let your programs speak to you
dputhier/pygtftk
A python package and a set of shell commands to handle GTF files