rishikksh20/VocGAN
VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network
Replaces the original hierarchically-nested discriminator with MelGAN's Full-Band architecture to significantly reduce training time while maintaining audio fidelity, achieving real-time vocoding from mel-spectrograms. Built in PyTorch with support for single-speaker (LJSpeech, KSS) and multi-speaker (VCTK) datasets at 22.05kHz, trainable end-to-end via the provided trainer with TensorBoard integration and inference pipeline for mel-to-audio conversion.
321 stars. No commits in the last 6 months.
Stars
321
Forks
59
Language
Python
License
MIT
Category
Last pushed
Jul 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/rishikksh20/VocGAN"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shangeth/wavencoder
WavEncoder is a Python library for encoding audio signals, transforms for audio augmentation,...
fatchord/WaveRNN
WaveRNN Vocoder + TTS
kan-bayashi/ParallelWaveGAN
Unofficial Parallel WaveGAN (+ MelGAN & Multi-band MelGAN & HiFi-GAN & StyleMelGAN) with Pytorch
HAKORADev/VODER
Voice Operation and Design Engine with Reproduction capabilities
seungwonpark/melgan
MelGAN vocoder (compatible with NVIDIA/tacotron2)