VBench and vistorybench

These are complementary evaluation frameworks that address different aspects of video generation assessment: VBench provides general-purpose video quality metrics (temporal consistency, motion, aesthetics), while ViStoryBench specifically evaluates narrative coherence and story comprehension in AI-generated video sequences.

VBench
80
Verified
vistorybench
45
Emerging
Maintenance 20/25
Adoption 18/25
Maturity 25/25
Community 17/25
Maintenance 10/25
Adoption 10/25
Maturity 15/25
Community 10/25
Stars: 1,537
Forks: 107
Downloads: 3,530
Commits (30d): 8
Language: Python
License: Apache-2.0
Stars: 139
Forks: 8
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No risk flags
No Package No Dependents

About VBench

Vchitect/VBench

[CVPR2024 Highlight] VBench - We Evaluate Video Generation

Provides hierarchical evaluation across 16+ dimensions (temporal consistency, motion smoothness, dynamic degree, etc.) with dimension-specific metrics and a curated prompt suite, enabling fine-grained assessment of video generation quality. Implements custom evaluation pipelines combining vision models (CLIP, optical flow, scene detection) with automatic metrics aligned to human preferences. Extends to image-to-video and long-form video evaluation while assessing trustworthiness dimensions like fairness and safety.

About vistorybench

ViStoryBench/vistorybench

[CVPR 2026] ViStoryBench: AI Story Visualization Benchmark

Provides a modular evaluation framework built on a `BaseEvaluator` abstract class that supports pluggable metrics for assessing narrative consistency, character fidelity, and visual coherence across 80 diverse stories in Chinese and English. The benchmark includes standardized dataset adapters for major story visualization methods (StoryDiffusion, UNO, StoryGen, etc.) and handles long-text prompts via SD embeddings to overcome token limitations. Published results and an active leaderboard are maintained on HuggingFace and a dedicated web portal for continuous community evaluation.

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work