taherfattahi/MetaWorld-VLA-openai-clip-vit
A lightweight Vision-Language-Action (VLA) baseline for MetaWorld robot-arm tasks using a pretrained CLIP-ViT vision transformer(openai/clip-vit-base-patch32), a small text transformer, and robot-state fusion
Stars
3
Forks
—
Language
Python
License
MIT
Category
Last pushed
Dec 27, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/taherfattahi/MetaWorld-VLA-openai-clip-vit"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OFA-Sys/Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny modality gap ensues!
BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives...
clip-italian/clip-italian
CLIP (Contrastive LanguageāImage Pre-training) for Italian