DRSY/MoTIS
[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)
Implements knowledge-distilled dual-encoders (6-12 layer Transformers) achieving CLIP-parity retrieval on MS COCO while reducing model size to 85-146MB and inference latency by 1.6-2.9x through layer pruning and supervised distillation. Provides multiple indexing strategies (linear scan, KMeans, Spotify Annoy) with lazy loading—encoding high-resolution images in the background while displaying thumbnails—and transpiles CLIP's tokenizer and preprocessing pipeline into native Swift/iOS via TorchScript.
126 stars. No commits in the last 6 months.
Stars
126
Forks
10
Language
Swift
License
—
Category
Last pushed
May 11, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/DRSY/MoTIS"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
unum-cloud/UForm
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts,...
rom1504/clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
mazzzystar/Queryable
Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.
s-emanuilov/litepali
LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing,...
Ubaida-M-Yusuf/Makimus-AI
AI-powered media search — find images and videos using natural language or visual queries