jingyaogong/minimind-v
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Combines SigLIP2 vision encoding with lightweight MLP projection to fuse image features into a compact language model backbone, supporting both dense and mixture-of-experts architectures. Implements complete training pipeline including vision-language pretraining and supervised instruction-tuning with DDP multi-GPU acceleration, bfloat16 mixed precision, and checkpoint resumption across varying hardware configurations. Models range from 26M to 201M parameters with inference footprints under 1.1GB, packaged in Hugging Face Transformers format for ecosystem compatibility.
6,712 stars. Actively maintained with 16 commits in the last 30 days.
Stars
6,712
Forks
736
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 04, 2026
Commits (30d)
16
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jingyaogong/minimind-v"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
roboflow/vision-ai-checkup
Take your LLM to the optometrist.
SkyworkAI/Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in...
zai-org/GLM-TTS
GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning
NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
EvolvingLMMs-Lab/NEO
NEO Series: Native Vision-Language Models from First Principles