saiteja-talluri/Speech2Face

Implementation of the CVPR 2019 Paper - Speech2Face: Learning the Face Behind a Voice by MIT CSAIL

/ 100

Emerging

Uses audio spectrograms and FaceNet embeddings to learn cross-modal associations between speech and facial features, trained on the AVSpeech dataset with face detection and frame extraction pipelines. The model predicts facial embeddings from audio input, evaluated via face retrieval metrics to assess whether the predicted features can identify the correct speaker's face. Supports multi-GPU training with configurable batch processing and includes preprocessing utilities for downloading and preparing audio-video data from YouTube.

178 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

178

Forks

Language

Python

License

MIT

Higher-rated alternatives

TensorSpeech/TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for...

lucasnewman/nanospeech

A simple, hackable text-to-speech system in PyTorch and MLX

Tomiinek/Multilingual_Text_to_Speech

An implementation of Tacotron 2 that supports multilingual experiments with parameter-sharing,...

jxzhanggg/nonparaSeq2seqVC_code

Implementation code of non-parallel sequence-to-sequence VC

keonlee9420/STYLER

Official repository of STYLER: Style Factor Modeling with Rapidity and Robustness via Speech...

Explore Voice AI Tools

All categories Trending Voice AI directory Insights