microsoftarchive/promptbench

A unified evaluation framework for large language models

Archived
45
/ 100
Emerging

Provides modular support for prompt engineering techniques (few-shot chain-of-thought, emotion prompting), adversarial robustness evaluation via prompt attacks, and dynamic test data generation to mitigate contamination. Built on PyTorch with extensible components for datasets, models, and evaluation methods, integrating specialized frameworks like DyVal for dynamic evaluation and PromptEval for efficient multi-prompt assessment across standard benchmarks (MMLU, BigBench Hard, GLUE) and multi-modal datasets.

2,787 stars.

Archived No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 19 / 25

How are scores calculated?

Stars

2,787

Forks

219

Language

Python

License

MIT

Last pushed

Feb 20, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/microsoftarchive/promptbench"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.