rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Employs token merging to reduce visual token count before LLM processing, enabling efficient video captioning while maintaining performance across inference and training. Includes the Video Detailed Caption (VDC) benchmark and integrates with lmms-eval for standardized evaluation, with planned support for HuggingFace transformers and SGLang deployment. Provides both XTuner and HuggingFace format weights, supporting continued training and inference through multiple framework backends.
139 stars. No commits in the last 6 months.
Stars
139
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Jun 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/rese1f/aurora"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zarzouram/image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
senadkurtisi/pytorch-image-captioning
Transformer & CNN Image Captioning model in PyTorch.
tanishqgautam/Image-Captioning
Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged...
ilya16/deephumor
DeepHumor: Image-based Meme Generation using Deep Learning
Hamtech-ai/Persian-Image-Captioning
A Persian Image Captioning model based on Vision Encoder Decoder Models of the transformers🤗.