burcgokden/lm-evaluation-harness-with-PLDR-LLM-kvg-cache

Fork of LM Evaluation Harness Suite for evaluating benchmarks in paper titled "PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference"

/ 100

Experimental

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

MIT

Category

safety-robustness-evaluation

Last pushed

Feb 25, 2025

Commits (30d)

GitHub

Safety Robustness Evaluation · 21 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/burcgokden/lm-evaluation-harness-with-PLDR-LLM-kvg-cache"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

PacificAI/langtest

Deliver safe & effective language models

microsoft/OpenRCA

[ICLR'25] OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?

Babelscape/ALERT

Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language...

TrustGen/TrustEval-toolkit

[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative...

ChenWu98/agent-attack

[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights