inuwamobarak/Image-captioning-ViT
Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-art pre-trained ViT models and employs technique
The architecture combines a pre-trained ViT encoder for image feature extraction with a transformer-based decoder for caption generation, employing transfer learning to reduce training overhead. It includes finetuning capabilities for custom datasets and evaluates output quality using standard metrics like BLEU, METEOR, and CIDEr. Built on PyTorch and the Hugging Face Transformers library, it also integrates LitServe for deploying the model as a production-ready inference server.
No commits in the last 6 months.
Stars
40
Forks
5
Language
Jupyter Notebook
License
—
Category
Last pushed
Oct 14, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/inuwamobarak/Image-captioning-ViT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stevan-milovanovic/LiteRT-for-Android
Image Classification, Image Captioning and LLM inference with LiteRT
floydhub/pix2code-template
Build a neural network to code a basic a HTML and CSS website based on a picture of a design mockup.
ekkonwork/qwen3-vl-autotagger-cli
Standalone CLI for Qwen3-VL auto-tagging with optional XMP embedding.
ABX9801/Image-Caption-Generator
A Web App to generate caption for Images. VGG-16 Model is used to encode the images and...
regiellis/ecko-cli
ecko-cli is a simple CLI tool that streamlines the process of processing images in a directory,...