keivalya/mini-vla

a minimal, beginner-friendly VLA to show how robot policies can fuse images, text, and states to generate actions

57
/ 100
Established

Implements diffusion-based action generation with separate encoders for vision (images), language (text instructions), and robot state, fused via an MLP before a diffusion policy head—all contained in ~150 lines of core model code. Designed for Meta-World environments with a complete pipeline: expert data collection, training on trajectory datasets, and inference with free-form text instructions. Prioritizes educational clarity and rapid prototyping over production optimization, making it suitable for learning diffusion policies and VLA architecture without heavy framework dependencies.

204 stars.

No Package No Dependents
Maintenance 13 / 25
Adoption 10 / 25
Maturity 13 / 25
Community 21 / 25

How are scores calculated?

Stars

204

Forks

40

Language

Python

License

MIT

Last pushed

Mar 17, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/keivalya/mini-vla"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.