OpenGVLab/Instruct2Act

Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

/ 100

Experimental

Uses an LLM to generate executable Python code that orchestrates perception, planning, and control loops for robotic manipulation. The perception pipeline chains foundation models—SAM for object segmentation and CLIP for classification—via predefined APIs that the LLM can invoke, enabling zero-shot task execution without learning-based policies. Integrates with VIMABench for tabletop manipulation evaluation and supports both task-specific and task-agnostic prompting strategies, with optional pointing-language augmentation for object selection.

373 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 1 / 25

Community 12 / 25

How are scores calculated?

Stars

373

Forks

Language

Python

License

—

Higher-rated alternatives

MantisAI/sieves

Plug-and-play document AI with zero-shot models.

xiaoya-li/Instruction-Tuning-Survey

Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`

princeton-pli/STAT

Skill-Targeted Adaptive Training

TencentARC-QQ/TagGPT

TagGPT: Large Language Models are Zero-shot Multimodal Taggers

rafaelpierre/bullet

bullet: A Zero-Shot / Few-Shot Learning, LLM Based, text classification framework

Explore LLM Tools

All categories Trending LLM Tool directory Insights