saiteja-talluri/Speech2Face

Implementation of the CVPR 2019 Paper - Speech2Face: Learning the Face Behind a Voice by MIT CSAIL

47
/ 100
Emerging

Uses audio spectrograms and FaceNet embeddings to learn cross-modal associations between speech and facial features, trained on the AVSpeech dataset with face detection and frame extraction pipelines. The model predicts facial embeddings from audio input, evaluated via face retrieval metrics to assess whether the predicted features can identify the correct speaker's face. Supports multi-GPU training with configurable batch processing and includes preprocessing utilities for downloading and preparing audio-video data from YouTube.

178 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

178

Forks

37

Language

Python

License

MIT

Last pushed

Mar 24, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/saiteja-talluri/Speech2Face"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.