Lahdhirim/CV-image-captioning-clip-gpt2

Image caption generation using a hybrid CLIP-GPT2 architecture. CLIP encodes the image while GPT-2 decodes into natural language captions. Modular and configurable pipelines for training, inference, and evaluation on datasets like COCO.

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 3 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

—

Category

clip-vision-language

Last pushed

Aug 14, 2025

Commits (30d)

GitHub

Clip Vision Language · 5 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Lahdhirim/CV-image-captioning-clip-gpt2"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

jmisilo/clip-gpt-captioning

CLIPxGPT Captioner is Image Captioning Model based on OpenAI's CLIP and GPT-2.

leaderj1001/CLIP

CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

PathologyFoundation/plip

Pathology Language and Image Pre-Training (PLIP) is the first vision and language foundation...

kesimeg/turkish-clip

OpenAI's clip model training for Turkish language using pretrained Resnet and DistilBERT

Explore Transformer Models

All categories Trending Transformer directory Insights