jingyaogong/minimind-v

🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!

63
/ 100
Established

Combines SigLIP2 vision encoding with lightweight MLP projection to fuse image features into a compact language model backbone, supporting both dense and mixture-of-experts architectures. Implements complete training pipeline including vision-language pretraining and supervised instruction-tuning with DDP multi-GPU acceleration, bfloat16 mixed precision, and checkpoint resumption across varying hardware configurations. Models range from 26M to 201M parameters with inference footprints under 1.1GB, packaged in Hugging Face Transformers format for ecosystem compatibility.

6,712 stars. Actively maintained with 16 commits in the last 30 days.

No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

6,712

Forks

736

Language

Python

License

Apache-2.0

Last pushed

Feb 04, 2026

Commits (30d)

16

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jingyaogong/minimind-v"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.