LLM Testing Frameworks Prompt Engineering Tools

Tools for systematically testing, evaluating, and validating LLM-powered applications through unit tests, integration tests, regression detection, and failure analysis. Does NOT include prompt optimization, monitoring/observability, or general testing frameworks without LLM-specific features.

There are 37 llm testing frameworks tools tracked. 1 score above 50 (established tier). The highest-rated is genieincodebottle/schemalock at 50/100 with 1 stars and 277 monthly downloads.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=llm-testing-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	genieincodebottle/schemalock LLM output contract testing CLI, define what your pipeline must return, test...	50	Established	1	JavaScript
2	joshualamerton/prompt-trace Prompt and response tracing for LLM workflows	36	Emerging	2	Python
3	antsanchez/prompto Interact with various LLMs in your browser (LangChain.js, Angular)	35	Emerging	17	HTML
4	Coolhand-Labs/coolhand-ruby Zero-config LLM cost & quality monitoring for Ruby apps - automatically log...	32	Emerging	9	Ruby
5	Coolhand-Labs/coolhand-python Zero-config LLM cost & quality monitoring for Python apps - automatically...	32	Emerging	1	Python
6	suhjohn/llm-workbench UI for testing prompts across various datasets locally	28	Experimental	13	TypeScript
7	atjsh/llmlingua-2-js JavaScript/TypeScript implementation of LLMLingua-2 (Experimental)	28	Experimental	22	TypeScript
8	adarshM84/TextLLaMA Transform your writing with TextLLaMA! ✍️🚀 Simplify grammar, translate...	27	Experimental	15	HTML
9	dzhng/llamaflow The Typescript-first prompt engineering toolkit for working with chat based LLMs.	27	Experimental	112	TypeScript
10	Cre4T3Tiv3/llm-prompt-debugger Clean UI for LLM development workflows with prompt versioning and model...	26	Experimental	48	TypeScript
11	sazed5055/llmtest pytest for LLM apps - Test for grounding failures, prompt injection,...	25	Experimental	3	Python
12	drorIvry/consisTent A Comprehensive Testing Framework for Prompts	25	Experimental	3	Python
13	parea-ai/parea-sdk-ts TypeScript SDK for experimenting, testing, evaluating & monitoring...	24	Experimental	4	TypeScript
14	rawveg/intellillm-playground LLM Playground that works with Open Router	22	Experimental	—	TypeScript
15	CodeForgeNet/tuneprompt Industrial-grade testing framework for LLM prompts	22	Experimental	—	TypeScript
16	elijahmuimi/llm-log Provide structured JSONL logging for large language models to simplify data...	22	Experimental	—	C++
17	anurag-aryan-tech/Mafia-Mediator-Dashboard A Python + Tkinter desktop dashboard for mediating Mafia games with LLM...	20	Experimental	1	Python
18	yasemineren/Typesentry LLM evaluation harness for TypeScript: adversarial suites, static checks,...	19	Experimental	—	TypeScript
19	VebjornNyvoll/promptcanary Lightweight prompt regression testing for your existing test suite. Test LLM...	19	Experimental	—	TypeScript
20	RahulMK22/llmtest 🚀 Comprehensive testing framework for LLM applications with semantic...	16	Experimental	1	Python
21	Mattbusel/prompt-observatory Unified LLM interpretability dashboard — real-time token streams,...	16	Experimental	2	Python
22	suzakuzhang/tarot-local-test An AI tarot reading web app with fixed card meanings and LLM-generated...	15	Experimental	1	Python
23	WilliamK112/prompttrace Prompt engineering and LLM evaluation framework with trace visualization,...	15	Experimental	1	HTML
24	YagneshKhamar/phasio Jest-style testing for LLM prompts. Version prompts, run evals across OpenAI...	14	Experimental	—	TypeScript
25	poyro/poyro Test your web app LLM integrations using existing testing frameworks....	14	Experimental	40	TypeScript
26	KristopherZlo/promptlab Evala is a team workspace for prompt engineering, AI experiments,...	14	Experimental	—	PHP
27	pavankumarinfo/ai-testing-healthcare Public whitepaper on AI testing strategies in healthcare using prompt...	13	Experimental	2	—
28	calibrtr/llm-prompt-test LLM Prompt Test helps you test Large Language Models (LLMs) prompts to...	13	Experimental	5	TypeScript
29	radoslaw-sz/maia A pytest-based framework for testing multi AI agents systems. It provides a...	12	Experimental	1	TypeScript
30	Yuankai619/LLM-Generated-web-and-Playwright-E2E-Testing Experiment about using LLM to generate web pages that meet the requirements...	12	Experimental	13	TypeScript
31	Omnia9789/ai-unit-test-generator-cli LLM-powered Python test generaunit-testingtor CLI with single-function...	11	Experimental	—	Python
32	sphinx010/testAignite_ TestAIgnite: an enterprise Cypress framework using Llama-3, Mixtral, and...	11	Experimental	—	JavaScript
33	sphinx010/testAIgnite TestAIgnite: an enterprise Cypress framework using Llama-3, Mixtral, and...	11	Experimental	—	JavaScript
34	cktang88/system-prompt-tester Test system prompts	11	Experimental	—	TypeScript
35	quantiauy/llmunit LLMUnit is a developer-first platform designed to bring the rigors of unit...	11	Experimental	—	TypeScript
36	LankeSathwik7/LLM-Regression-Lab Cloud-hosted LLM regression testing lab with eval suites, run diffs,...	11	Experimental	—	TypeScript
37	amitpuri/llm-playground LLM Playground - Demo Solution	11	Experimental	—	Python

Comparisons in this category

coolhand-ruby and coolhand-python (32 vs 32)