OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Trained on ~200M Chinese image-text pairs, it combines vision encoders (ResNet50 to ViT-H) with RoBERTa/RBT3 text encoders optimized for Chinese semantic alignment. The framework supports multiple deployment formats (ONNX, TensorRT, CoreML) and includes advanced training techniques like FlashAttention, gradient accumulation, and knowledge distillation for efficient fine-tuning on downstream tasks.
5,820 stars. No commits in the last 6 months.
Stars
5,820
Forks
548
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Aug 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/OFA-Sys/Chinese-CLIP"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO
BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...
clip-italian/clip-italian
CLIP (Contrastive LanguageāImage Pre-training) for Italian
zer0int/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!