C-you-know/Action-Based-LLM-Testing-Harness

Ranking Large Language Models using the Principle of Least Action! Built during my time at Knit Space, Hubbali under the guidance Prof. Prakash Hegade.

/ 100

Experimental

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 4 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Category

safety-robustness-evaluation

Last pushed

Jul 30, 2025

Commits (30d)

GitHub

Safety Robustness Evaluation · 21 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/C-you-know/Action-Based-LLM-Testing-Harness"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

microsoft/OpenRCA

[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

PacificAI/langtest

Deliver safe & effective language models

TrustGen/TrustEval-toolkit

[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative...

Babelscape/ALERT

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language...

ChenWu98/agent-attack

[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights