promptbench and prompt-evaluator

These two tools are competitors, with PromptBench being a more established and comprehensive unified evaluation framework for large language models, while prompt-evaluator offers a GUI-driven workflow focused on evaluating, testing, and comparing LLM prompts with features like token usage tracking and result visualization.

promptbench
70
Verified
prompt-evaluator
30
Emerging
Maintenance 10/25
Adoption 16/25
Maturity 25/25
Community 19/25
Maintenance 6/25
Adoption 3/25
Maturity 9/25
Community 12/25
Stars: 2,785
Forks: 219
Downloads: 288
Commits (30d): 0
Language: Python
License: MIT
Stars: 4
Forks: 1
Downloads:
Commits (30d): 0
Language: TypeScript
License: MIT
No risk flags
No Package No Dependents

About promptbench

microsoft/promptbench

A unified evaluation framework for large language models

About prompt-evaluator

syamsasi99/prompt-evaluator

prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work