AgentBench and LawBench

AgentBench

Established

LawBench

Emerging

Maintenance 10/25

Adoption 10/25

Maturity 16/25

Community 19/25

Maintenance 0/25

Adoption 10/25

Maturity 9/25

Community 22/25

Stars: 3,234

Forks: 241

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

Stars: 406

Forks: 70

Downloads: —

Commits (30d): 0

Language: Python

License: Apache-2.0

No Package No Dependents

Stale 6m No Package No Dependents

About AgentBench

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Comprises 8 diverse task environments (OS interaction, database queries, knowledge graphs, web shopping/browsing, card games, and puzzles) with containerized deployment via Docker Compose. Evaluates agents through multi-turn interactions using function-calling prompts, integrated with AgentRL for end-to-end reinforcement learning workflows. Provides standardized dev/test splits with performance leaderboards across different LLM implementations.

About LawBench

open-compass/LawBench

Benchmarking Legal Knowledge of Large Language Models

Related comparisons

AgentBench and bigcodebench AgentBench and MemoryAgentBench

Scores updated daily from GitHub, PyPI, and npm data. How scores work