abhaymundhara/llm-benchmark-suite

Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.

17
/ 100
Experimental
No License No Package No Dependents
Maintenance 10 / 25
Adoption 2 / 25
Maturity 5 / 25
Community 0 / 25

How are scores calculated?

Stars

2

Forks

Language

Python

License

Category

code-generation

Last pushed

Feb 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/abhaymundhara/llm-benchmark-suite"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.