harmlessman/PAFTS

PAFTS : Library That Preprocessing Audio For TTS.

43
/ 100
Emerging

Integrates UVR for vocal/music separation, pyannote-audio for speaker diarization, and OpenAI's Whisper for speech-to-text transcription to create speaker-isolated, noise-cleaned training datasets. The pipeline automatically organizes output into speaker-labeled directories with corresponding JSON transcriptions, enabling end-to-end conversion of raw multi-speaker audio into structured TTS training data. Requires PyTorch (GPU-accelerated), FFmpeg, and HuggingFace authentication for diarization models.

No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 11 / 25
Maturity 18 / 25
Community 14 / 25

How are scores calculated?

Stars

27

Forks

5

Language

Python

License

MIT

Last pushed

Nov 15, 2024

Monthly downloads

33

Commits (30d)

0

Dependencies

25

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/harmlessman/PAFTS"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.