promptpex and promptly
These are complementary tools: PromptPEx provides a framework for *generating* systematic test cases for prompts, while Promptly supplies a curated *collection* of pre-made prompts to evaluate—one automates test creation, the other supplies evaluation material.
About promptpex
microsoft/promptpex
Test Generation for Prompts
Automatically extracts output rules from natural language prompts and generates targeted unit tests to validate whether LLM responses comply with those rules across different models. Uses LLM-based evaluation to assess test outcomes and integrates with OpenAI Evals API for standardized test export and execution via GitHub Models.
About promptly
equinor/promptly
A prompt collection for testing and evaluation of LLMs.
Scores updated daily from GitHub, PyPI, and npm data. How scores work