sharpsalt/Captionforge-Multimodal-Image-Captioning-System
This PyTorch-based image captioning model uses ResNet-50 encoder and Transformer decoder to generate descriptive captions from Flickr8k images. Features include data augmentation (image transforms, synonym replacement), training with AdamW and early stopping, and inference via greedy, beam search (k=3,5,10), or nucleus sampling.
Stars
—
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 03, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sharpsalt/Captionforge-Multimodal-Image-Captioning-System"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zarzouram/image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
senadkurtisi/pytorch-image-captioning
Transformer & CNN Image Captioning model in PyTorch.
tanishqgautam/Image-Captioning
Implemented 3 different architectures to tackle the Image Caption problem, i.e, Merged...
ilya16/deephumor
DeepHumor: Image-based Meme Generation using Deep Learning