VBench and ChronoMagic-Bench

These are complementary evaluation frameworks that address different temporal aspects of video generation—VBench provides general-purpose video quality metrics across multiple dimensions, while ChronoMagic-Bench specializes in evaluating temporal consistency and metamorphic transformations specific to time-lapse video generation.

VBench
73
Verified
ChronoMagic-Bench
43
Emerging
Maintenance 20/25
Adoption 18/25
Maturity 18/25
Community 17/25
Maintenance 13/25
Adoption 10/25
Maturity 9/25
Community 11/25
Stars: 1,537
Forks: 107
Downloads: 3,530
Commits (30d): 8
Language: Python
License: Apache-2.0
Stars: 210
Forks: 14
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About VBench

Vchitect/VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Provides hierarchical evaluation across 16+ dimensions (temporal consistency, motion smoothness, dynamic degree, etc.) with dimension-specific metrics and a curated prompt suite, enabling fine-grained assessment of video generation quality. Implements custom evaluation pipelines combining vision models (CLIP, optical flow, scene detection) with automatic metrics aligned to human preferences. Extends to image-to-video and long-form video evaluation while assessing trustworthiness dimensions like fairness and safety.

About ChronoMagic-Bench

PKU-YuanGroup/ChronoMagic-Bench

[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation

Provides metamorphic evaluation of text-to-video models through time-lapse generation tasks grounded in physics, biology, and chemistry priors, with curated ChronoMagic-Pro datasets containing 460K+ video-text pairs. Introduces CHScore, a robust temporal coherence metric for assessing physics-aware transformations, and hosts an open leaderboard for benchmarking diverse text-to-video models including proprietary systems like Sora.

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work