OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

/ 100

Emerging

Trained on ~200M Chinese image-text pairs, it combines vision encoders (ResNet50 to ViT-H) with RoBERTa/RBT3 text encoders optimized for Chinese semantic alignment. The framework supports multiple deployment formats (ONNX, TensorRT, CoreML) and includes advanced training techniques like FlashAttention, gradient accumulation, and knowledge distillation for efficient fine-tuning on downstream tasks.

5,820 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 20 / 25

How are scores calculated?

Stars

5,820

Forks

548

Language

Jupyter Notebook

License

MIT

Related models

kastalimohammed1965/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

Kaushalya/medclip

A multi-modal CLIP model trained on the medical dataset ROCO

BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...

clip-italian/clip-italian

CLIP (Contrastive Language–Image Pre-training) for Italian

zer0int/CLIP-fine-tune-registers-gated

Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!

Explore Transformer Models

All categories Trending Transformer directory Insights