Emotional-Text-to-Speech/dl-for-emo-tts
:computer: :robot: A summary on our attempts at using Deep Learning approaches for Emotional Text to Speech :speaker:
Implements Tacotron and DC-TTS architectures fine-tuned on emotional speech datasets (RAVDESS, EMOV-DB) pre-trained on LJ Speech, systematically exploring transfer learning strategies like encoder freezing and learning rate adjustments. Provides reproducible training pipelines and comparative analysis across multiple emotion corpora, with successful configurations documented for single-speaker synthesis with monotonic attention constraints.
458 stars. No commits in the last 6 months.
Stars
458
Forks
45
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jun 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Emotional-Text-to-Speech/dl-for-emo-tts"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
aqibsaeed/Urban-Sound-Classification
Urban sound classification using Deep Learning
spotify/realbook
Easier audio-based machine learning with TensorFlow.
ArmDeveloperEcosystem/ml-audio-classifier-example-for-pico
ML Audio Classifier Example for Pico 🔊🔥🔔
IliaZenkov/sklearn-audio-classification
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering,...
mimbres/neural-audio-fp
Official implementation of Neural Audio Fingerprint (ICASSP 2021)