open_clip and AlphaCLIP

AlphaCLIP builds upon the open-source CLIP implementation by adding spatial attention mechanisms to focus on user-specified regions, making it an enhanced variant rather than a direct competitor.

open_clip
86
Verified
AlphaCLIP
36
Emerging
Maintenance 16/25
Adoption 25/25
Maturity 25/25
Community 20/25
Maintenance 2/25
Adoption 10/25
Maturity 9/25
Community 15/25
Stars: 13,496
Forks: 1,253
Downloads: 2,903,706
Commits (30d): 1
Language: Python
License:
Stars: 869
Forks: 58
Downloads:
Commits (30d): 0
Language: Jupyter Notebook
License: Apache-2.0
No risk flags
Stale 6m No Package No Dependents

About open_clip

mlfoundations/open_clip

An open source implementation of CLIP.

Supports diverse Vision Transformer and ConvNet architectures trained on large-scale datasets (LAION-2B, DataComp-1B) with published scaling laws, achieving competitive zero-shot ImageNet accuracy up to 85.4%. Integrates with PyTorch, Hugging Face model hub, and timm for image encoders, enabling efficient embedding computation via the clip-retrieval library. Offers flexible model loading from local checkpoints or HuggingFace, with pre-trained weights optimized for both inference and fine-tuning workflows.

About AlphaCLIP

SunzeY/AlphaCLIP

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Incorporates alpha-channel (transparency/mask) conditioning into CLIP's vision encoder, enabling region-focused feature extraction by accepting binary foreground masks alongside images. Built on LoRA-based fine-tuning of standard CLIP backbones (ViT-B/16, ViT-L/14) trained on the MaskImageNet dataset. Integrates seamlessly with downstream applications like Stable Diffusion, LLaVA, and BLIP for improved performance in masked image understanding, zero-shot classification, and vision-language tasks.

Scores updated daily from GitHub, PyPI, and npm data. How scores work