OpenGVLab/Instruct2Act
Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model
Uses an LLM to generate executable Python code that orchestrates perception, planning, and control loops for robotic manipulation. The perception pipeline chains foundation models—SAM for object segmentation and CLIP for classification—via predefined APIs that the LLM can invoke, enabling zero-shot task execution without learning-based policies. Integrates with VIMABench for tabletop manipulation evaluation and supports both task-specific and task-agnostic prompting strategies, with optional pointing-language augmentation for object selection.
373 stars. No commits in the last 6 months.
Stars
373
Forks
22
Language
Python
License
—
Category
Last pushed
Jun 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/OpenGVLab/Instruct2Act"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MantisAI/sieves
Plug-and-play document AI with zero-shot models.
xiaoya-li/Instruction-Tuning-Survey
Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`
princeton-pli/STAT
Skill-Targeted Adaptive Training
TencentARC-QQ/TagGPT
TagGPT: Large Language Models are Zero-shot Multimodal Taggers
rafaelpierre/bullet
bullet: A Zero-Shot / Few-Shot Learning, LLM Based, text classification framework