openai/CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

60
/ 100
Established

Trained on 400M image-text pairs using contrastive learning, CLIP jointly encodes images and text into a shared embedding space where cosine similarity enables zero-shot classification without task-specific fine-tuning. Built on Vision Transformers and text encoders in PyTorch, it integrates seamlessly with torchvision for preprocessing and supports multiple model scales (ViT-B/32, ViT-L/14, etc.) for deployment flexibility.

32,796 stars. Actively maintained with 1 commit in the last 30 days.

No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

32,796

Forks

3,961

Language

Jupyter Notebook

License

MIT

Last pushed

Feb 18, 2026

Commits (30d)

1

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/openai/CLIP"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.