CLIP Vision Language ML Frameworks
Implementations, adaptations, and applications of CLIP and similar vision-language models for zero-shot classification, image-text matching, and multimodal tasks. Does NOT include other vision-language models (like BLIP or LLaVA), general multimodal frameworks, or unrelated CLIPS language systems.
There are 53 clip vision language frameworks tracked. 1 score above 70 (verified tier). The highest-rated is mlfoundations/open_clip at 86/100 with 13,496 stars and 2,903,706 monthly downloads. 2 of the top 10 are actively maintained.
Get all 53 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=clip-vision-language&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
mlfoundations/open_clip
An open source implementation of CLIP. |
|
Verified |
| 2 |
noxdafox/clipspy
Python CFFI bindings for the 'C' Language Integrated Production System CLIPS |
|
Established |
| 3 |
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant... |
|
Established |
| 4 |
filipbasara0/simple-clip
A minimal, but effective implementation of CLIP (Contrastive Language-Image... |
|
Emerging |
| 5 |
moein-shariatnia/OpenAI-CLIP
Simple implementation of OpenAI CLIP model in PyTorch. |
|
Emerging |
| 6 |
BioMedIA-MBZUAI/FetalCLIP
Official repository of FetalCLIP: A Visual-Language Foundation Model for... |
|
Emerging |
| 7 |
cliport/cliport
CLIPort: What and Where Pathways for Robotic Manipulation |
|
Emerging |
| 8 |
WolodjaZ/MSAE
Interpreting CLIP with Hierarchical Sparse Autoencoders (ICML 2025) |
|
Emerging |
| 9 |
Dalageo/paperclip-inspection
Analyzing Paper Clips Using Deep Learning and Computer Vision Techniques 📎 |
|
Emerging |
| 10 |
noxdafox/iclips
CLIPS Jupyter console |
|
Emerging |
| 11 |
SunzeY/AlphaCLIP
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want |
|
Emerging |
| 12 |
kyegomez/CLIPQ
A simple implementation of a CLIP that splits up an image into quandrants... |
|
Emerging |
| 13 |
LeapLabTHU/Cross-Modal-Adapter
[Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrieval |
|
Emerging |
| 14 |
svpino/clip-container
A containerized REST API around OpenAI's CLIP model. |
|
Emerging |
| 15 |
lakeraai/onnx_clip
An ONNX-based implementation of the CLIP model that doesn't depend on torch... |
|
Emerging |
| 16 |
SiddhantBikram/MemeCLIP
Official Repository for the paper 'MemeCLIP: Leveraging CLIP Representations... |
|
Emerging |
| 17 |
jaisidhsingh/CoN-CLIP
Implementation of the "Learn No to Say Yes Better" paper. |
|
Emerging |
| 18 |
merveenoyan/siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers... |
|
Emerging |
| 19 |
kevinzakka/clip_playground
An ever-growing playground of notebooks showcasing CLIP's impressive... |
|
Emerging |
| 20 |
sarthaxxxxx/BATCLIP
[ICCV '25] BATCLIP: Bimodal Online Test-Time Adaptation for CLIP |
|
Experimental |
| 21 |
UCSC-VLAA/CLIPA
[NeurIPS 2023] This repository includes the official implementation of our... |
|
Experimental |
| 22 |
Mauville/MedCLIP
Medical image captioning using OpenAI's CLIP |
|
Experimental |
| 23 |
sixu0/SeisCLIP
The code of Paper 'SeisCLIP: A seismology foundation model pre-trained by... |
|
Experimental |
| 24 |
aygong/ClipMind
Code for the paper "ClipMind: A Framework for Auditing Short-Format Video... |
|
Experimental |
| 25 |
RobertBiehl/CLIP-tf2
OpenAI CLIP converted to Tensorflow 2/Keras |
|
Experimental |
| 26 |
bes-dev/pytorch_clip_bbox
Pytorch based library to rank predicted bounding boxes using text/image... |
|
Experimental |
| 27 |
bes-dev/pytorch_clip_guided_loss
A simple library that implements CLIP guided loss in PyTorch. |
|
Experimental |
| 28 |
LAION-AI/scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning... |
|
Experimental |
| 29 |
KeremTurgutlu/clip_art
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification - 4th... |
|
Experimental |
| 30 |
halixness/understanding-CLIP
Repo from the "Learning with limited labeled data" seminar @ Uni of... |
|
Experimental |
| 31 |
Krok1/adversarial-patch-for-clip
Adversarial patch system for privacy protection against CLIP image... |
|
Experimental |
| 32 |
CoderChen01/InterCLIP-MEP
Official repository of the paper "InterCLIP-MEP: Interactive CLIP and... |
|
Experimental |
| 33 |
ExcelsiorCJH/CLIP
CLIP: Learning Transferable Visual Models From Natural Language Supervision |
|
Experimental |
| 34 |
zjunlp/SPEECH
[ACL 2023] SPEECH: Structured Prediction with Energy-Based Event-Centric Hyperspheres |
|
Experimental |
| 35 |
sbmagar13/VQGAN-CLIP-Text-to-Image
Text-to-Image Synthesis using Multimodal (VQGAN + CLIP) Architectures |
|
Experimental |
| 36 |
A-SHOJAEI/multimodal-contrastive-captioning-with-preference-aligned-generation
Vision-language model combining CLIP-style contrastive learning with... |
|
Experimental |
| 37 |
your-ai-solution/generation-image-caption
This application fine-tunes the CLIP model on the Flickr8k dataset to align... |
|
Experimental |
| 38 |
Evfidiw/MoBA
[ACMMM'24] MoBA: Mixture of Bi-directional Adapter for Multi-modal Sarcasm Detection |
|
Experimental |
| 39 |
Jaso1024/Refining-Generated-Videos
IEEE 2023 | REGIS: Refining Generated Videos via Iterative Stylistic Remodeling |
|
Experimental |
| 40 |
D0miH/does-clip-know-my-face
Source Code for the JAIR Paper "Does CLIP Know my Face?" (Demo:... |
|
Experimental |
| 41 |
Komorebirumu/awe-ms-20260316-1451-01
AI Historical Document Authenticity Checker (Local Archives) |
|
Experimental |
| 42 |
Jeyjey123456/ReVidgen
🎥 Rethink video generation for the embodied world with ReVidgen, leveraging... |
|
Experimental |
| 43 |
Bijay-kumar-sethy/clip
🔍 Solve linear programming problems efficiently with Clp, an open-source... |
|
Experimental |
| 44 |
ImtiazShuvo/clip-lora-food101-classification
Transfer learning and parameter-efficient fine-tuning of CLIP on the... |
|
Experimental |
| 45 |
Fr0zenCrane/Cockatiel
The official implementation of our paper "Cockatiel: Ensembling Synthetic... |
|
Experimental |
| 46 |
MingliangLiang3/GLIP
Centered Masking for Language-Image Pre-training |
|
Experimental |
| 47 |
buraksatar/RoME_video_retrieval
It includes our two recent papers on text-to-video retrieval along with a... |
|
Experimental |
| 48 |
jonkahana/CLIPPR
An official PyTorch implementation for CLIPPR |
|
Experimental |
| 49 |
rhysdg/vision-at-a-clip
Low-latency ONNX and TensorRT based zero-shot classification and detection... |
|
Experimental |
| 50 |
nicolafan/clipper
Explore your CLIP embeddings in a bidimensional space |
|
Experimental |
| 51 |
KeithLin724/HAR_Clip
Human Action Recognition using Clip |
|
Experimental |
| 52 |
MaharshPatelX/qwen-clip-multimodal
Multimodal Vision-AI: CLIP eyes + Qwen2.5 brain, 155 K-step pipeline & demo. |
|
Experimental |
| 53 |
smb-h/mqirtn
Multimodal Query Enhancement for Image Retrieval using Transformer Networks (MQIRTN) |
|
Experimental |