CLIP and simple-clip
The official OpenAI implementation serves as the reference model and weights that the minimal PyTorch reimplementation attempts to replicate for educational or resource-constrained purposes.
About CLIP
openai/CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Trained on 400M image-text pairs using contrastive learning, CLIP jointly encodes images and text into a shared embedding space where cosine similarity enables zero-shot classification without task-specific fine-tuning. Built on Vision Transformers and text encoders in PyTorch, it integrates seamlessly with torchvision for preprocessing and supports multiple model scales (ViT-B/32, ViT-L/14, etc.) for deployment flexibility.
About simple-clip
filipbasara0/simple-clip
A minimal, but effective implementation of CLIP (Contrastive Language-Image Pretraining) in PyTorch
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work