DRSY/MoTIS

[NAACL 2022]Mobile Text-to-Image search powered by multimodal semantic representation models(e.g., OpenAI's CLIP)

/ 100

Experimental

Implements knowledge-distilled dual-encoders (6-12 layer Transformers) achieving CLIP-parity retrieval on MS COCO while reducing model size to 85-146MB and inference latency by 1.6-2.9x through layer pruning and supervised distillation. Provides multiple indexing strategies (linear scan, KMeans, Spotify Annoy) with lazy loading—encoding high-resolution images in the background while displaying thumbnails—and transpiles CLIP's tokenizer and preprocessing pipeline into native Swift/iOS via TorchScript.

126 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 11 / 25

How are scores calculated?

Stars

126

Forks

Language

Swift

License

—

Higher-rated alternatives

unum-cloud/UForm

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts,...

rom1504/clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

mazzzystar/Queryable

Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.

s-emanuilov/litepali

LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing,...

Ubaida-M-Yusuf/Makimus-AI

AI-powered media search — find images and videos using natural language or visual queries

Explore Embedding Tools

All categories Trending Embeddings directory Insights