ABaldrati/CLIP4Cir

[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

/ 100

Emerging

Implements a two-stage training pipeline: task-oriented fine-tuning of CLIP's vision and text encoders using contrastive loss, followed by training a learnable Combiner network that fuses multimodal features through adaptive weighting and residual composition. Targets composed image retrieval on FashionIQ and CIRR datasets, where users query by combining a reference image with textual modifications, and achieves state-of-the-art results by bridging the gap between CLIP's general pre-training and task-specific requirements.

192 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 12 / 25

How are scores calculated?

Stars

192

Forks

Language

Python

License

MIT

Higher-rated alternatives

OBA-Research/VAAS

VAAS is an inference-first, research-driven library for image integrity analysis. It integrates...

deepmancer/clip-object-detection

Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.

IvanAer/G-Universal-CLIP

4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level...

joanrod/ocr-vqgan

OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in...

Explore Computer Vision Tools

All categories Trending Computer Vision directory Insights