LLM Testing Frameworks Prompt Engineering Tools

Tools for systematically testing, evaluating, and validating LLM-powered applications through unit tests, integration tests, regression detection, and failure analysis. Does NOT include prompt optimization, monitoring/observability, or general testing frameworks without LLM-specific features.

There are 37 llm testing frameworks tools tracked. 1 score above 50 (established tier). The highest-rated is genieincodebottle/schemalock at 50/100 with 1 stars and 277 monthly downloads.

Get all 37 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=prompt-engineering&subcategory=llm-testing-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 genieincodebottle/schemalock

LLM output contract testing CLI, define what your pipeline must return, test...

50
Established
2 joshualamerton/prompt-trace

Prompt and response tracing for LLM workflows

36
Emerging
3 antsanchez/prompto

Interact with various LLMs in your browser (LangChain.js, Angular)

35
Emerging
4 Coolhand-Labs/coolhand-ruby

Zero-config LLM cost & quality monitoring for Ruby apps - automatically log...

32
Emerging
5 Coolhand-Labs/coolhand-python

Zero-config LLM cost & quality monitoring for Python apps - automatically...

32
Emerging
6 suhjohn/llm-workbench

UI for testing prompts across various datasets locally

28
Experimental
7 atjsh/llmlingua-2-js

JavaScript/TypeScript implementation of LLMLingua-2 (Experimental)

28
Experimental
8 adarshM84/TextLLaMA

Transform your writing with TextLLaMA! ✍️🚀 Simplify grammar, translate...

27
Experimental
9 dzhng/llamaflow

The Typescript-first prompt engineering toolkit for working with chat based LLMs.

27
Experimental
10 Cre4T3Tiv3/llm-prompt-debugger

Clean UI for LLM development workflows with prompt versioning and model...

26
Experimental
11 sazed5055/llmtest

pytest for LLM apps - Test for grounding failures, prompt injection,...

25
Experimental
12 drorIvry/consisTent

A Comprehensive Testing Framework for Prompts

25
Experimental
13 parea-ai/parea-sdk-ts

TypeScript SDK for experimenting, testing, evaluating & monitoring...

24
Experimental
14 rawveg/intellillm-playground

LLM Playground that works with Open Router

22
Experimental
15 CodeForgeNet/tuneprompt

Industrial-grade testing framework for LLM prompts

22
Experimental
16 elijahmuimi/llm-log

Provide structured JSONL logging for large language models to simplify data...

22
Experimental
17 anurag-aryan-tech/Mafia-Mediator-Dashboard

A Python + Tkinter desktop dashboard for mediating Mafia games with LLM...

20
Experimental
18 yasemineren/Typesentry

LLM evaluation harness for TypeScript: adversarial suites, static checks,...

19
Experimental
19 VebjornNyvoll/promptcanary

Lightweight prompt regression testing for your existing test suite. Test LLM...

19
Experimental
20 RahulMK22/llmtest

🚀 Comprehensive testing framework for LLM applications with semantic...

16
Experimental
21 Mattbusel/prompt-observatory

Unified LLM interpretability dashboard — real-time token streams,...

16
Experimental
22 suzakuzhang/tarot-local-test

An AI tarot reading web app with fixed card meanings and LLM-generated...

15
Experimental
23 WilliamK112/prompttrace

Prompt engineering and LLM evaluation framework with trace visualization,...

15
Experimental
24 YagneshKhamar/phasio

Jest-style testing for LLM prompts. Version prompts, run evals across OpenAI...

14
Experimental
25 poyro/poyro

Test your web app LLM integrations using existing testing frameworks....

14
Experimental
26 KristopherZlo/promptlab

Evala is a team workspace for prompt engineering, AI experiments,...

14
Experimental
27 pavankumarinfo/ai-testing-healthcare

Public whitepaper on AI testing strategies in healthcare using prompt...

13
Experimental
28 calibrtr/llm-prompt-test

LLM Prompt Test helps you test Large Language Models (LLMs) prompts to...

13
Experimental
29 radoslaw-sz/maia

A pytest-based framework for testing multi AI agents systems. It provides a...

12
Experimental
30 Yuankai619/LLM-Generated-web-and-Playwright-E2E-Testing

Experiment about using LLM to generate web pages that meet the requirements...

12
Experimental
31 Omnia9789/ai-unit-test-generator-cli

LLM-powered Python test generaunit-testingtor CLI with single-function...

11
Experimental
32 sphinx010/testAignite_

TestAIgnite: an enterprise Cypress framework using Llama-3, Mixtral, and...

11
Experimental
33 sphinx010/testAIgnite

TestAIgnite: an enterprise Cypress framework using Llama-3, Mixtral, and...

11
Experimental
34 cktang88/system-prompt-tester

Test system prompts

11
Experimental
35 quantiauy/llmunit

LLMUnit is a developer-first platform designed to bring the rigors of unit...

11
Experimental
36 LankeSathwik7/LLM-Regression-Lab

Cloud-hosted LLM regression testing lab with eval suites, run diffs,...

11
Experimental
37 amitpuri/llm-playground

LLM Playground - Demo Solution

11
Experimental

Comparisons in this category