davidset13/intelligence_eval
This will allow any agent to use LLM evaluation benchmarks. Currently, this only supports the HLE and MMLU-Pro, but future additions will be made to support many different benchmarks.
No commits in the last 6 months.
Stars
2
Forks
—
Language
Python
License
MIT
Category
Last pushed
Sep 07, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/davidset13/intelligence_eval"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Cloud-CV/EvalAI
:cloud: :rocket: :bar_chart: :chart_with_upwards_trend: Evaluating state of the art in AI
fireindark707/Python-Schema-Matching
A python tool using XGboost and sentence-transformers to perform schema matching task on tables.
graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and...
visual-layer/fastdup
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and...
josh-ashkinaze/plurals
Plurals: A System for Guiding LLMs Via Simulated Social Ensembles