hyeonsangjeon/gdpval-realworks
Benchmark LLMs on real professional tasks, not academic puzzles. YAML-driven experiment pipeline + live React dashboard for GDPVal Gold Subset (220 tasks across 11 industries).
Stars
11
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/hyeonsangjeon/gdpval-realworks"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
mlflow/mlflow
The open source AI engineering platform. MLflow enables teams of all sizes to debug, evaluate,...
kitops-ml/kitops
An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets,...
aws-samples/mlops-e2e
MLOps End-to-End Example using Amazon SageMaker Pipeline, AWS CodePipeline and AWS CDK
tensorchord/envd
🏕️ Reproducible development environment for humans and agents
techiescamp/mlops-for-devops
MLOps for DevOps Engineers - A hands-on, project-based guide to Machine Learning Operations