Uncategorized Multimodal AI Tools

There are 39 uncategorized tools tracked. 1 score above 70 (verified tier). The highest-rated is starVLA/starVLA at 71/100 with 1,702 stars. 3 of the top 10 are actively maintained.

Get all 39 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=multimodal&subcategory=uncategorized&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	starVLA/starVLA StarVLA: A Lego-like Codebase for Vision-Language-Action Model Developing	71	Verified	1,702	Python
2	vortex-data/vortex An extensible, state-of-the-art framework for columnar compression, and the...	69	Established	2,853	Rust
3	motis-project/motis multimodal routing, geocoding, and map tiles	64	Established	491	C++
4	zai-org/GLM-V GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with...	64	Established	2,266	Python
5	neka-nat/cad3dify 2D to 3D CAD Conversion Using VLM	61	Established	247	Python
6	batmanlab/Mammo-CLIP [MICCAI 2024, top 11%] Official Pytorch implementation of Mammo-CLIP: A...	59	Established	90	Python
7	opendatalab/mineru-vl-utils A Python package for interacting with the MinerU Vision-Language Model.	57	Established	109	Python
8	EMob-Lab/MnMS Agent-based Multimodal Urban Moblity Simulator resulting from the ERC MAGnUM project	51	Established	20	Python
9	GerrySant/multimodalhugs MultimodalHugs is an extension of Hugging Face that offers a generalized...	51	Established	15	Python
10	withceleste/celeste-python Open source, type-safe primitives for multi-modal AI. All modelities, all...	50	Established	219	Python
11	cloudglue/cloudglue-js Official JavaScript / TypeScript SDK for Cloudglue API	48	Emerging	5	TypeScript
12	EvolvingLMMs-Lab/LongVT [CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling	47	Emerging	217	Python
13	om-ai-lab/GroundVLP GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language...	46	Emerging	74	Jupyter Notebook
14	Jinfeng-Xu/Awesome-Multimodal-Recommender-Systems [TMM'26] Continuously Updated Awesome Multimodal Recommendation Paper List	45	Emerging	97	—
15	anam-org/metaxy Pluggable sample-level metadata versioning for incremental multimodal pipelines.	45	Emerging	89	Python
16	eduardosanzb/escribano AI-powered session intelligence tool - transcribes Cap recordings with Whisper	44	Emerging	5	TypeScript
17	yunncheng/MMRL [CVPR 2025 & IJCV2026] Official PyTorch Code for "MMRL: Multi-Modal...	43	Emerging	102	Python
18	Mellow-Artificial-Intelligence/open-xtract Extract structured data from documents, images, audio, and video using LLMs.	43	Emerging	16	Python
19	ComfyUI-Kelin/ComfyUI-LLMs-Toolkit ComfyUI custom nodes for DeepSeek, Qwen, GPT, and other OpenAI-compatible...	43	Emerging	19	Python
20	MING-ZCH/CII-Bench [ACL 2025] Can MLLMs Understand the Deep Implication Behind Chinese Images?	38	Emerging	21	Python
21	mturan33/isaac-g1-ulc Low Level RL Controller for G1	38	Emerging	11	Python
22	Jinfeng-Xu/Multimodal-Recommendation-Library A Continuously Updated Library for Advanced Models for Multimodal Recommendation	37	Emerging	3	Python
23	nguyennm1024/OSCaR 🔥🔥🔥 Object State Description & Change Detection	34	Emerging	10	Python
24	AZIRARM/nodify Nodify is a powerful and flexible headless content management system (CMS)...	33	Emerging	8	Java
25	winstxnhdw/telegroq A serverless invite-only AI-powered chat bot on Telegram.	33	Emerging	10	TypeScript
26	ai-akashic/Memorose Next-generation self-evolving multimodal memory brain.	30	Emerging	24	Rust
27	Henry-Who321/RAdaR RAdaR is an RL-native adaptive reasoning framework for VLMs that dynamically...	29	Experimental	12	Python
28	video-db/skills Server-side video workflows for agents: ingest, understand, search, edit, stream.	29	Experimental	65	Python
29	samletnorge/machine-core A flexible agent framework for building AI agents with MCP (Model Context...	29	Experimental	3	Python
30	Air00100/domain-normalizer 🌐 Normalize and parse domain names from messy input, cleaning errors and...	29	Experimental	3	Go
31	microsoft/AsgardBench Visually grounded planning benchmark for multimodal agents	27	Experimental	3	Python
32	TLtanium/meta-lingo-electron Meta-Lingo is a comprehensive desktop application designed for corpus...	27	Experimental	4	HTML
33	mturan33/isaac-g1-vlm VLM-RL Hierarchical Loco-Manupilation For Long-Horizon Tasks With G1 robot...	27	Experimental	4	Python
34	yc-cui/LLaRS Multi-modal remote sensing image restoration and fusion foundation model...	25	Experimental	3	Python
35	iLearn-Lab/ACMMM24-AD-DRL The PyTorch implementation of AD-DRL	25	Experimental	7	Python
36	Krisocer/FigureWeave Generate editable scientific SVG figures from method text with local SAM3...	25	Experimental	4	Python
37	wendell0218/Awesome-Motion-Datasets A curated list of motion-related datasets	24	Experimental	5	—
38	Eganchiyu/Yuki-Chan-Bot 🌸 基于 DeepSeek-V3 的异步 AI 助手：集成“生物感”精力系统、双池 RAG 长效记忆与多模态视觉感知的电子妹妹	19	Experimental	3	Python
39	step-out/Multimodal-Model-Zoo A curated collection of 100+ multimodal large language models	17	Experimental	3	CSS