JindongGu/Awesome-Prompting-on-Vision-Language-Model

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

33
/ 100
Emerging

Organized curated papers across three distinct VLM categories—multimodal-to-text generation (Flamingo), image-text matching (CLIP), and text-to-image generation (Stable Diffusion)—with taxonomies distinguishing hard prompts (task instructions, in-context learning, retrieval-based, chain-of-thought) from soft prompts (prompt tuning, prefix token tuning). The collection emphasizes methods that preserve base model weights while adapting VLMs through fusion module architectures (encoder-decoder vs. decoder-only) and input augmentation strategies. Includes linked implementations, publication venues, and technical notes for each paper to support systematic research into prompt engineering across multimodal foundation models.

509 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 8 / 25
Community 15 / 25

How are scores calculated?

Stars

509

Forks

40

Language

License

Last pushed

Mar 18, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/JindongGu/Awesome-Prompting-on-Vision-Language-Model"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.