unum-cloud/UForm

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

/ 100

Established

Based on the README, here's a technical summary: Combines Matryoshka-style embeddings down to 64 dimensions with quantization-aware training for fast semantic search via USearch, while generative models leverage ViT encoders paired with compact language models (Qwen, LLaMA) for image captioning and VQA. Exports native ONNX models with bfloat16 support across Python, JavaScript, and Swift for edge deployment from servers to mobile devices.

1,221 stars and 2,280 monthly downloads. Available on PyPI.

Maintenance 6 / 25

Adoption 18 / 25

Maturity 18 / 25

Community 16 / 25

How are scores calculated?

Stars

1,221

Forks

Language

Python

License

Apache-2.0

Related tools

rom1504/clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them

mazzzystar/Queryable

Run OpenAI's CLIP and Apple's MobileCLIP model on iOS to search photos.

Ubaida-M-Yusuf/Makimus-AI

AI-powered media search — find images and videos using natural language or visual queries

s-emanuilov/litepali

LitePali is a minimal, efficient implementation of ColPali for image retrieval and indexing,...

HEGOM61ita/OffGallery

Catalogatore AI di immagini fotografiche · Compatibile con Lightroom — Tag automatici, Metadata...

Explore Embedding Tools

All categories Trending Embeddings directory Insights