LLM Testing Frameworks Prompt Engineering Tools
Tools for systematically testing, evaluating, and validating LLM-powered applications through unit tests, integration tests, regression detection, and failure analysis. Does NOT include prompt optimization, monitoring/observability, or general testing frameworks without LLM-specific features.
There are 37 llm testing frameworks tools tracked. 1 score above 50 (established tier). The highest-rated is genieincodebottle/schemalock at 50/100 with 1 stars and 277 monthly downloads.
Get all 37 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=llm-testing-frameworks&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
genieincodebottle/schemalock
LLM output contract testing CLI, define what your pipeline must return, test... |
|
Established |
| 2 |
joshualamerton/prompt-trace
Prompt and response tracing for LLM workflows |
|
Emerging |
| 3 |
antsanchez/prompto
Interact with various LLMs in your browser (LangChain.js, Angular) |
|
Emerging |
| 4 |
Coolhand-Labs/coolhand-ruby
Zero-config LLM cost & quality monitoring for Ruby apps - automatically log... |
|
Emerging |
| 5 |
Coolhand-Labs/coolhand-python
Zero-config LLM cost & quality monitoring for Python apps - automatically... |
|
Emerging |
| 6 |
suhjohn/llm-workbench
UI for testing prompts across various datasets locally |
|
Experimental |
| 7 |
atjsh/llmlingua-2-js
JavaScript/TypeScript implementation of LLMLingua-2 (Experimental) |
|
Experimental |
| 8 |
adarshM84/TextLLaMA
Transform your writing with TextLLaMA! ✍️🚀 Simplify grammar, translate... |
|
Experimental |
| 9 |
dzhng/llamaflow
The Typescript-first prompt engineering toolkit for working with chat based LLMs. |
|
Experimental |
| 10 |
Cre4T3Tiv3/llm-prompt-debugger
Clean UI for LLM development workflows with prompt versioning and model... |
|
Experimental |
| 11 |
sazed5055/llmtest
pytest for LLM apps - Test for grounding failures, prompt injection,... |
|
Experimental |
| 12 |
drorIvry/consisTent
A Comprehensive Testing Framework for Prompts |
|
Experimental |
| 13 |
parea-ai/parea-sdk-ts
TypeScript SDK for experimenting, testing, evaluating & monitoring... |
|
Experimental |
| 14 |
rawveg/intellillm-playground
LLM Playground that works with Open Router |
|
Experimental |
| 15 |
CodeForgeNet/tuneprompt
Industrial-grade testing framework for LLM prompts |
|
Experimental |
| 16 |
elijahmuimi/llm-log
Provide structured JSONL logging for large language models to simplify data... |
|
Experimental |
| 17 |
anurag-aryan-tech/Mafia-Mediator-Dashboard
A Python + Tkinter desktop dashboard for mediating Mafia games with LLM... |
|
Experimental |
| 18 |
yasemineren/Typesentry
LLM evaluation harness for TypeScript: adversarial suites, static checks,... |
|
Experimental |
| 19 |
VebjornNyvoll/promptcanary
Lightweight prompt regression testing for your existing test suite. Test LLM... |
|
Experimental |
| 20 |
RahulMK22/llmtest
🚀 Comprehensive testing framework for LLM applications with semantic... |
|
Experimental |
| 21 |
Mattbusel/prompt-observatory
Unified LLM interpretability dashboard — real-time token streams,... |
|
Experimental |
| 22 |
suzakuzhang/tarot-local-test
An AI tarot reading web app with fixed card meanings and LLM-generated... |
|
Experimental |
| 23 |
WilliamK112/prompttrace
Prompt engineering and LLM evaluation framework with trace visualization,... |
|
Experimental |
| 24 |
YagneshKhamar/phasio
Jest-style testing for LLM prompts. Version prompts, run evals across OpenAI... |
|
Experimental |
| 25 |
poyro/poyro
Test your web app LLM integrations using existing testing frameworks.... |
|
Experimental |
| 26 |
KristopherZlo/promptlab
Evala is a team workspace for prompt engineering, AI experiments,... |
|
Experimental |
| 27 |
pavankumarinfo/ai-testing-healthcare
Public whitepaper on AI testing strategies in healthcare using prompt... |
|
Experimental |
| 28 |
calibrtr/llm-prompt-test
LLM Prompt Test helps you test Large Language Models (LLMs) prompts to... |
|
Experimental |
| 29 |
radoslaw-sz/maia
A pytest-based framework for testing multi AI agents systems. It provides a... |
|
Experimental |
| 30 |
Yuankai619/LLM-Generated-web-and-Playwright-E2E-Testing
Experiment about using LLM to generate web pages that meet the requirements... |
|
Experimental |
| 31 |
Omnia9789/ai-unit-test-generator-cli
LLM-powered Python test generaunit-testingtor CLI with single-function... |
|
Experimental |
| 32 |
sphinx010/testAignite_
TestAIgnite: an enterprise Cypress framework using Llama-3, Mixtral, and... |
|
Experimental |
| 33 |
sphinx010/testAIgnite
TestAIgnite: an enterprise Cypress framework using Llama-3, Mixtral, and... |
|
Experimental |
| 34 |
cktang88/system-prompt-tester
Test system prompts |
|
Experimental |
| 35 |
quantiauy/llmunit
LLMUnit is a developer-first platform designed to bring the rigors of unit... |
|
Experimental |
| 36 |
LankeSathwik7/LLM-Regression-Lab
Cloud-hosted LLM regression testing lab with eval suites, run diffs,... |
|
Experimental |
| 37 |
amitpuri/llm-playground
LLM Playground - Demo Solution |
|
Experimental |