saiteja-talluri/Speech2Face
Implementation of the CVPR 2019 Paper - Speech2Face: Learning the Face Behind a Voice by MIT CSAIL
Uses audio spectrograms and FaceNet embeddings to learn cross-modal associations between speech and facial features, trained on the AVSpeech dataset with face detection and frame extraction pipelines. The model predicts facial embeddings from audio input, evaluated via face retrieval metrics to assess whether the predicted features can identify the correct speaker's face. Supports multi-GPU training with configurable batch processing and includes preprocessing utilities for downloading and preparing audio-video data from YouTube.
178 stars. No commits in the last 6 months.
Stars
178
Forks
37
Language
Python
License
MIT
Category
Last pushed
Mar 24, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/saiteja-talluri/Speech2Face"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TensorSpeech/TensorFlowTTS
:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for...
lucasnewman/nanospeech
A simple, hackable text-to-speech system in PyTorch and MLX
Tomiinek/Multilingual_Text_to_Speech
An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing,...
jxzhanggg/nonparaSeq2seqVC_code
Implementation code of non-parallel sequence-to-sequence VC
keonlee9420/STYLER
Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech...