ABaldrati/CLIP4Cir
[ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features
Implements a two-stage training pipeline: task-oriented fine-tuning of CLIP's vision and text encoders using contrastive loss, followed by training a learnable Combiner network that fuses multimodal features through adaptive weighting and residual composition. Targets composed image retrieval on FashionIQ and CIRR datasets, where users query by combining a reference image with textual modifications, and achieves state-of-the-art results by bridging the gap between CLIP's general pre-training and task-specific requirements.
192 stars. No commits in the last 6 months.
Stars
192
Forks
16
Language
Python
License
MIT
Category
Last pushed
Sep 05, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/ABaldrati/CLIP4Cir"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OBA-Research/VAAS
VAAS is an inference-first, research-driven library for image integrity analysis. It integrates...
deepmancer/clip-object-detection
Zero-shot object detection with CLIP, utilizing Faster R-CNN for region proposals.
IvanAer/G-Universal-CLIP
4th place solution for the Google Universal Image Embedding Kaggle Challenge. Instance-Level...
joanrod/ocr-vqgan
OCR-VQGAN, a discrete image encoder (tokenizer and detokenizer) for figure images in...