CLIP Image Embeddings Transformer Models
Tools for generating and working with CLIP image-text embeddings, including implementations, fine-tuning, and lightweight variants. Does NOT include general vision-language models, text-to-image generation, or multimodal fusion frameworks.
There are 23 clip image embeddings models tracked. The highest-rated is OFA-Sys/Chinese-CLIP at 48/100 with 5,820 stars.
Get all 23 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=clip-image-embeddings&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and... |
|
Emerging |
| 2 |
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO |
|
Emerging |
| 3 |
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny... |
|
Emerging |
| 4 |
BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning... |
|
Emerging |
| 5 |
clip-italian/clip-italian
CLIP (Contrastive Language–Image Pre-training) for Italian |
|
Emerging |
| 6 |
zer0int/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny... |
|
Experimental |
| 7 |
YUSH19883/cog-jinaai-jina-clip-v2
🖼️ Generate high-quality multimodal embeddings for text and images with Jina... |
|
Experimental |
| 8 |
Armaggheddon/ClipServe
🚀 ClipServe: A fast API server for embedding text, images, and performing... |
|
Experimental |
| 9 |
kyegomez/MuonClip
This repository is an open source implementation of the MuonClip strategy... |
|
Experimental |
| 10 |
taherfattahi/MetaWorld-VLA-openai-clip-vit
A lightweight Vision-Language-Action (VLA) baseline for MetaWorld robot-arm... |
|
Experimental |
| 11 |
safinal/compositional-image-retrieval
Solution for the First Challenge of the Main Phase in the Rayan... |
|
Experimental |
| 12 |
iKrishneel/zsis
CLIP based Zero Shot Instance Segmentation |
|
Experimental |
| 13 |
FuxiaoLiu/DocumentCLIP
[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents |
|
Experimental |
| 14 |
theSohamTUmbare/CLIP-model
Reimplementation of the CLIP model |
|
Experimental |
| 15 |
SuryaAnything/V-DeClip
Masked Multi-Component Gated Decomposition Architecture |
|
Experimental |
| 16 |
zsxkib/cog-jinaai-jina-clip-v2
Jina CLIP v2 - Multimodal embedding model for text and images with... |
|
Experimental |
| 17 |
VijayPrakashReddy-k/CLIP-PACL
Contrastive Language - Image Pre-training (CLIP) and Patch Aligned... |
|
Experimental |
| 18 |
MuhammadAliS/CLIP
PyTorch implementation of OpenAI's CLIP model for image classification,... |
|
Experimental |
| 19 |
corentin-ryr/CLIP-mixer
Implementation of CLIP using a Mixer architecture |
|
Experimental |
| 20 |
ntat/Lightweight_CLIP_model
A lightweight Pytorch implementation of OpenAI's CLIP model. |
|
Experimental |
| 21 |
Rakshath66/ClipFindr
🔍 A CLIP-powered image similarity finder built with Streamlit — upload a... |
|
Experimental |
| 22 |
seanghay/clipsort
Group images by provided labels using OpenAI/CLIP |
|
Experimental |
| 23 |
ptmorris03/CLIPEmbedding
Easy text-image embedding and similarity with pretrained CLIP in PyTorch |
|
Experimental |