JindongGu/Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
Organized curated papers across three distinct VLM categories—multimodal-to-text generation (Flamingo), image-text matching (CLIP), and text-to-image generation (Stable Diffusion)—with taxonomies distinguishing hard prompts (task instructions, in-context learning, retrieval-based, chain-of-thought) from soft prompts (prompt tuning, prefix token tuning). The collection emphasizes methods that preserve base model weights while adapting VLMs through fusion module architectures (encoder-decoder vs. decoder-only) and input augmentation strategies. Includes linked implementations, publication venues, and technical notes for each paper to support systematic research into prompt engineering across multimodal foundation models.
509 stars. No commits in the last 6 months.
Stars
509
Forks
40
Language
—
License
—
Category
Last pushed
Mar 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/JindongGu/Awesome-Prompting-on-Vision-Language-Model"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
OpenDriveLab/DriveLM
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
ShiZhengyan/PowerfulPromptFT
[NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining?...
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for...
Lamorati92/LLMs-from-scratch
📚 Build and train your own GPT-like Large Language Model from scratch with clear guidance and...
mala-lab/NegPrompt
The official implementation of CVPR 24' Paper "Learning Transferable Negative Prompts for...