abhaymundhara/llm-benchmark-suite
Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.
Stars
2
Forks
—
Language
Python
License
—
Category
Last pushed
Feb 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/abhaymundhara/llm-benchmark-suite"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Dav1dde/glad
Multi-Language Vulkan/GL/GLES/EGL/GLX/WGL Loader-Generator based on the official specs.
Aleph-Alpha/ts-rs
Generate TypeScript bindings from Rust types
awtkns/fastapi-crudrouter
A dynamic FastAPI router that automatically creates CRUD routes for your models
apollographql/apollo-tooling
✏️ Apollo CLI for client tooling (Mostly replaced by Rover)
mmcloughlin/avo
Generate x86 Assembly with Go