davidset13/intelligence_eval
This will allow any agent to use LLM evaluation benchmarks. Currently, this only supports the HLE and MMLU-Pro, but future additions will be made to support many different benchmarks.
No commits in the last 6 months.
Stars
2
Forks
—
Language
Python
License
MIT
Category
Last pushed
Sep 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/agents/davidset13/intelligence_eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
strands-agents/evals
A comprehensive evaluation framework for AI agents and LLM applications.
eve-mas/eve-parity
Equilibrium Verification Environment (EVE) is a formal verification tool for the automated...
usestrix/benchmarks
Evaluation harness for Strix agent
KazKozDev/murmur
A Mix of Agents Orchestration System for Distributed LLM Processing
tanvirbhachu/ai-bench
A CLI benchmark runner for testing AI Models quickly.