JindongGu/Awesome-Prompting-on-Vision-Language-Model

This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.

/ 100

Emerging

Organized curated papers across three distinct VLM categories—multimodal-to-text generation (Flamingo), image-text matching (CLIP), and text-to-image generation (Stable Diffusion)—with taxonomies distinguishing hard prompts (task instructions, in-context learning, retrieval-based, chain-of-thought) from soft prompts (prompt tuning, prefix token tuning). The collection emphasizes methods that preserve base model weights while adapting VLMs through fusion module architectures (encoder-decoder vs. decoder-only) and input augmentation strategies. Includes linked implementations, publication venues, and technical notes for each paper to support systematic research into prompt engineering across multimodal foundation models.

509 stars. No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

509

Forks

Language

—

License

—

Higher-rated alternatives

OpenDriveLab/DriveLM

[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering

ShiZhengyan/PowerfulPromptFT

[NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining?...

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for...

Lamorati92/LLMs-from-scratch

📚 Build and train your own GPT-like Large Language Model from scratch with clear guidance and...

mala-lab/NegPrompt

The official implementation of CVPR 24' Paper "Learning Transferable Negative Prompts for...

Explore Prompt Engineering Tools

All categories Trending Prompt Engineering directory Insights