rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

/ 100

Emerging

Employs token merging to reduce visual token count before LLM processing, enabling efficient video captioning while maintaining performance across inference and training. Includes the Video Detailed Caption (VDC) benchmark and integrates with lmms-eval for standardized evaluation, with planned support for HuggingFace transformers and SGLang deployment. Provides both XTuner and HuggingFace format weights, supporting continued training and inference through multiple framework backends.

139 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

139

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

senadkurtisi/pytorch-image-captioning

Transformer & CNN Image Captioning model in PyTorch.

tanishqgautam/Image-Captioning

Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged...

ilya16/deephumor

DeepHumor: Image-based Meme Generation using Deep Learning

Hamtech-ai/Persian-Image-Captioning

A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.

Explore Transformer Models

All categories Trending Transformer directory Insights