inuwamobarak/Image-captioning-ViT

Image Captioning Vision Transformers (ViTs) are transformer models that generate descriptive captions for images by combining the power of Transformers and computer vision. It leverages state-of-the-art pre-trained ViT models and employs technique

/ 100

Experimental

The architecture combines a pre-trained ViT encoder for image feature extraction with a transformer-based decoder for caption generation, employing transfer learning to reduce training overhead. It includes finetuning capabilities for custom datasets and evaluates output quality using standard metrics like BLEU, METEOR, and CIDEr. Built on PyTorch and the Hugging Face Transformers library, it also integrates LitServe for deploying the model as a production-ready inference server.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Higher-rated alternatives

stevan-milovanovic/LiteRT-for-Android

Image Classification, Image Captioning and LLM inference with LiteRT

floydhub/pix2code-template

Build a neural network to code a basic a HTML and CSS website based on a picture of a design mockup.

ekkonwork/qwen3-vl-autotagger-cli

Standalone CLI for Qwen3-VL auto-tagging with optional XMP embedding.

ABX9801/Image-Caption-Generator

A Web App to generate caption for Images. VGG-16 Model is used to encode the images and...

regiellis/ecko-cli

ecko-cli is a simple CLI tool that streamlines the process of processing images in a directory,...

Explore Generative AI Tools

All categories Trending Generative AI directory Insights