promptbench and prompt-evaluator
These two tools are competitors, with PromptBench being a more established and comprehensive unified evaluation framework for large language models, while prompt-evaluator offers a GUI-driven workflow focused on evaluating, testing, and comparing LLM prompts with features like token usage tracking and result visualization.
About promptbench
microsoft/promptbench
A unified evaluation framework for large language models
About prompt-evaluator
syamsasi99/prompt-evaluator
prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work