sharpsalt/Captionforge-Multimodal-Image-Captioning-System

This PyTorch-based image captioning model uses ResNet-50 encoder and Transformer decoder to generate descriptive captions from Flickr8k images. Features include data augmentation (image transforms, synonym replacement), training with AdamW and early stopping, and inference via greedy, beam search (k=3,5,10), or nucleus sampling.

/ 100

Experimental

No License No Package No Dependents

Maintenance 10 / 25

Adoption 0 / 25

Maturity 1 / 25

Community 0 / 25

How are scores calculated?

Stars

—

Forks

—

Language

Python

License

—

Category

image-captioning-transformers

Last pushed

Mar 03, 2026

Commits (30d)

GitHub

Image Captioning Transformers · 28 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sharpsalt/Captionforge-Multimodal-Image-Captioning-System"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

zarzouram/image_captioning_with_transformers

Pytorch implementation of image captioning using transformer-based model.

rese1f/aurora

[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

senadkurtisi/pytorch-image-captioning

Transformer & CNN Image Captioning model in PyTorch.

tanishqgautam/Image-Captioning

Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged...

ilya16/deephumor

DeepHumor: Image-based Meme Generation using Deep Learning

Explore Transformer Models

All categories Trending Transformer directory Insights