AgentBench and LawBench
Maintenance
10/25
Adoption
10/25
Maturity
16/25
Community
19/25
Maintenance
0/25
Adoption
10/25
Maturity
9/25
Community
22/25
Stars: 3,234
Forks: 241
Downloads: —
Commits (30d): 0
Language: Python
License: Apache-2.0
Stars: 406
Forks: 70
Downloads: —
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package
No Dependents
Stale 6m
No Package
No Dependents
About AgentBench
THUDM/AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Comprises 8 diverse task environments (OS interaction, database queries, knowledge graphs, web shopping/browsing, card games, and puzzles) with containerized deployment via Docker Compose. Evaluates agents through multi-turn interactions using function-calling prompts, integrated with AgentRL for end-to-end reinforcement learning workflows. Provides standardized dev/test splits with performance leaderboards across different LLM implementations.
About LawBench
open-compass/LawBench
Benchmarking Legal Knowledge of Large Language Models
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work