LatentSync and TempoSyncDiff

Both tools aim to enhance audio-driven talking head generation by improving the synchronization of speech with generated video, making them direct competitors in the "speech-synthesis-diffusion" category, with LatentSync focusing on taming Stable Diffusion for lip sync and TempoSyncDiff emphasizing faster generation while maintaining quality.

LatentSync

Established

TempoSyncDiff

Emerging

Maintenance 2/25

Adoption 10/25

Maturity 16/25

Community 23/25

Maintenance 13/25

Adoption 2/25

Maturity 9/25

Community 12/25

Stars: 5,506

Forks: 899

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 2

Forks: 1

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

Stale 6m No Package No Dependents

No Package No Dependents

About LatentSync

bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

Audio-conditioned latent diffusion operating directly in Stable Diffusion's latent space, using Whisper-extracted audio embeddings injected via U-Net cross-attention layers. Trains with TREPA, LPIPS, and SyncNet losses in pixel space while working on compressed latents, avoiding intermediate motion representations. Supports multi-resolution training (256×256 to 512×512) with configurable efficiency modes, ranging from 20–55 GB VRAM depending on stage and resolution.

About TempoSyncDiff

mazumdarsoumya/TempoSyncDiff

Few-step diffusion for audio-driven talking head generation making diffusion models speak faster without losing their composure.

Scores updated daily from GitHub, PyPI, and npm data. How scores work