CLIP Image Embeddings Transformer Models

Tools for generating and working with CLIP image-text embeddings, including implementations, fine-tuning, and lightweight variants. Does NOT include general vision-language models, text-to-image generation, or multimodal fusion frameworks.

There are 23 clip image embeddings models tracked. The highest-rated is OFA-Sys/Chinese-CLIP at 48/100 with 5,820 stars.

Get all 23 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=clip-image-embeddings&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and...

48
Emerging
2 Kaushalya/medclip

A multi-modal CLIP model trained on the medical dataset ROCO

44
Emerging
3 kastalimohammed1965/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny...

39
Emerging
4 BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning...

35
Emerging
5 clip-italian/clip-italian

CLIP (Contrastive Language–Image Pre-training) for Italian

32
Emerging
6 zer0int/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny...

29
Experimental
7 YUSH19883/cog-jinaai-jina-clip-v2

🖼️ Generate high-quality multimodal embeddings for text and images with Jina...

22
Experimental
8 Armaggheddon/ClipServe

🚀 ClipServe: A fast API server for embedding text, images, and performing...

21
Experimental
9 kyegomez/MuonClip

This repository is an open source implementation of the MuonClip strategy...

21
Experimental
10 taherfattahi/MetaWorld-VLA-openai-clip-vit

A lightweight Vision-Language-Action (VLA) baseline for MetaWorld robot-arm...

18
Experimental
11 safinal/compositional-image-retrieval

Solution for the First Challenge of the Main Phase in the Rayan...

17
Experimental
12 iKrishneel/zsis

CLIP based Zero Shot Instance Segmentation

16
Experimental
13 FuxiaoLiu/DocumentCLIP

[ICPRAI 2024] DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents

15
Experimental
14 theSohamTUmbare/CLIP-model

Reimplementation of the CLIP model

13
Experimental
15 SuryaAnything/V-DeClip

Masked Multi-Component Gated Decomposition Architecture

13
Experimental
16 zsxkib/cog-jinaai-jina-clip-v2

Jina CLIP v2 - Multimodal embedding model for text and images with...

12
Experimental
17 VijayPrakashReddy-k/CLIP-PACL

Contrastive Language - Image Pre-training (CLIP) and Patch Aligned...

12
Experimental
18 MuhammadAliS/CLIP

PyTorch implementation of OpenAI's CLIP model for image classification,...

12
Experimental
19 corentin-ryr/CLIP-mixer

Implementation of CLIP using a Mixer architecture

12
Experimental
20 ntat/Lightweight_CLIP_model

A lightweight Pytorch implementation of OpenAI's CLIP model.

11
Experimental
21 Rakshath66/ClipFindr

🔍 A CLIP-powered image similarity finder built with Streamlit — upload a...

11
Experimental
22 seanghay/clipsort

Group images by provided labels using OpenAI/CLIP

10
Experimental
23 ptmorris03/CLIPEmbedding

Easy text-image embedding and similarity with pretrained CLIP in PyTorch

10
Experimental