LatentSync and TempoSyncDiff

Both tools aim to enhance audio-driven talking head generation by improving the synchronization of speech with generated video, making them direct competitors in the "speech-synthesis-diffusion" category, with LatentSync focusing on taming Stable Diffusion for lip sync and TempoSyncDiff emphasizing faster generation while maintaining quality.

LatentSync
51
Established
TempoSyncDiff
36
Emerging
Maintenance 2/25
Adoption 10/25
Maturity 16/25
Community 23/25
Maintenance 13/25
Adoption 2/25
Maturity 9/25
Community 12/25
Stars: 5,506
Forks: 899
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 2
Forks: 1
Downloads:
Commits (30d): 0
Language: Python
License: MIT
Stale 6m No Package No Dependents
No Package No Dependents

About LatentSync

bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

Audio-conditioned latent diffusion operating directly in Stable Diffusion's latent space, using Whisper-extracted audio embeddings injected via U-Net cross-attention layers. Trains with TREPA, LPIPS, and SyncNet losses in pixel space while working on compressed latents, avoiding intermediate motion representations. Supports multi-resolution training (256×256 to 512×512) with configurable efficiency modes, ranging from 20–55 GB VRAM depending on stage and resolution.

About TempoSyncDiff

mazumdarsoumya/TempoSyncDiff

Few-step diffusion for audio-driven talking head generation making diffusion models speak faster without losing their composure.

Scores updated daily from GitHub, PyPI, and npm data. How scores work