Domain Specific Benchmarks AI Agents

There are 3 domain specific benchmarks agents tracked. 1 score above 50 (established tier). The highest-rated is Tongyi-MAI/MobileWorld at 51/100 with 152 stars.

Get all 3 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=agents&subcategory=domain-specific-benchmarks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Agent Score Tier
1 Tongyi-MAI/MobileWorld

Benchmarking Autonomous Mobile Agents in Agent-User Interactive and...

51
Established
2 OSU-NLP-Group/ScienceAgentBench

[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents...

45
Emerging
3 ml-dev-bench/ml-dev-bench

ML-Dev-Bench is a benchmark for evaluating AI agents against various ML...

40
Emerging