open_clip and cliport
CLIPort builds upon open_clip by using CLIP embeddings as its vision-language foundation for robotic manipulation tasks, making them complements rather than competitors.
About open_clip
mlfoundations/open_clip
An open source implementation of CLIP.
Supports diverse Vision Transformer and ConvNet architectures trained on large-scale datasets (LAION-2B, DataComp-1B) with published scaling laws, achieving competitive zero-shot ImageNet accuracy up to 85.4%. Integrates with PyTorch, Hugging Face model hub, and timm for image encoders, enabling efficient embedding computation via the clip-retrieval library. Offers flexible model loading from local checkpoints or HuggingFace, with pre-trained weights optimized for both inference and fine-tuning workflows.
About cliport
cliport/cliport
CLIPort: What and Where Pathways for Robotic Manipulation
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work