DiffGesture and DiffuseStyleGesture

Both tools address audio-driven co-speech gesture generation using diffusion models but with different focus: DiffGesture emphasizes the core diffusion-based generation approach while DiffuseStyleGesture extends it with explicit style control, making them complementary techniques that could be combined rather than direct competitors.

DiffGesture
52
Established
DiffuseStyleGesture
50
Established
Maintenance 13/25
Adoption 10/25
Maturity 16/25
Community 13/25
Maintenance 6/25
Adoption 10/25
Maturity 16/25
Community 18/25
Stars: 261
Forks: 19
Downloads:
Commits (30d): 0
Language: Python
License: GPL-3.0
Stars: 206
Forks: 31
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No Package No Dependents
No Package No Dependents

About DiffGesture

Advocate99/DiffGesture

[CVPR'2023] Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Employs a Diffusion Audio-Gesture Transformer architecture to jointly model cross-modal audio-to-skeleton associations while preserving temporal coherence through an annealed noise sampling strategy. Integrates classifier-free guidance for diversity-quality trade-offs and uses pretrained autoencoders (from HA2G) for perceptual metrics on TED Gesture and TED Expressive datasets. Supports both short/long video synthesis with skeleton sequence generation conditioned on audio input.

About DiffuseStyleGesture

YoungSeng/DiffuseStyleGesture

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models (IJCAI 2023) | The DiffuseStyleGesture+ entry to the GENEA Challenge 2023 (ICMI 2023, Reproducibility Award)

Leverages diffusion models with WavLM audio embeddings to generate stylized full-body gestures conditioned on speech, supporting controllable style and intensity parameters. The architecture uses LMDB-based training pipelines on mocap datasets (ZEGGS, BEAT, TWH) and outputs motion in BVH format compatible with Blender visualization. Implements motion matching variants (QPGesture) and multi-dataset training (UnifiedGesture) as downstream extensions, with pre-trained checkpoints available for inference.

Scores updated daily from GitHub, PyPI, and npm data. How scores work