LLM Agent Training Gyms LLM Tools

Gymnasium-style environments and frameworks for training LLM agents through reinforcement learning, multi-turn decision-making, and self-play. Does NOT include general RL frameworks, agent orchestration platforms, or applications using pre-trained agents.

There are 50 llm agent training gyms tools tracked. 3 score above 50 (established tier). The highest-rated is Gen-Verse/LatentMAS at 53/100 with 800 stars. 1 of the top 10 are actively maintained.

Get all 50 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-agent-training-gyms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 Gen-Verse/LatentMAS

Latent Collaboration in Multi-Agent Systems

53
Established
2 ai4co/reevo

[NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with...

51
Established
3 SALT-NLP/collaborative-gym

Framework and toolkits for building and evaluating collaborative agents that...

51
Established
4 lean-dojo/LeanCopilot

LLMs as Copilots for Theorem Proving in Lean

48
Emerging
5 sethkarten/LLM-Economist

Official repository of the 2025 paper, LLM Economist: Large Population...

47
Emerging
6 WooooDyy/AgentGym-RL

Code and implementations for the paper "AgentGym-RL: Training LLM Agents for...

46
Emerging
7 datphamvn/HSEvo

[AAAI-25] HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven...

43
Emerging
8 FusionBrainLab/gigaevo-core

Evolutionary algorithm that uses Large Language Models (LLMs) to...

43
Emerging
9 WooooDyy/AgentGym

Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large...

42
Emerging
10 GeminiLight/gen-mentor

[WWW '25 Oral - GenMentor] Official code of our paper "LLM-powered...

42
Emerging
11 proger/haloop

Agent toolkit for 100 hours of speech and 10 GiB of text

42
Emerging
12 axon-rl/gem

A Gym for Agentic LLMs

42
Emerging
13 Alibaba-Quark/SSP

Search Self-Play: Pushing the Frontier of Agent Capability without Supervision

39
Emerging
14 zju-vipa/Odyssey

Odyssey: Empowering Minecraft Agents with Open-World Skills

37
Emerging
15 spiral-rl/spiral

SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent...

36
Emerging
16 wellecks/llmstep

llmstep: [L]LM proofstep suggestions in Lean 4.

34
Emerging
17 zjunlp/MachineSoM

[ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social...

33
Emerging
18 moment-timeseries-foundation-model/TimeSeriesGym

Official code for TimeSeriesGym: A Scalable Benchmark for (Time Series)...

33
Emerging
19 thu-nics/MARSHAL

MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs

32
Emerging
20 codezakh/DataEnvGym

A testbed for agents and environments that can automatically improve models...

32
Emerging
21 atasoglu/toolsgen

A modular Python library for synthesizing tool-calling datasets from JSON...

32
Emerging
22 bin123apple/InfantAgent

[NeurIPS 2025] A multimodal agent that can interact with its own PC in a...

29
Experimental
23 wshi83/MedAgentGym

[ICLR'26] MedAgentGYM: Training LLM Agents for Code-Based Medical Reasoning at Scale

28
Experimental
24 OpenMLRL/LLM_Collab_Code_Generation

LLM Collaboration for Code Generation

26
Experimental
25 Human-Oriented-ATP/motivated-proof-facilitator

A graphical interface that makes it convenient to construct "motivated...

25
Experimental
26 hg0428/Mar-PS

A Multi-Agent Reasoning Problem Solver. You build teams and they work...

25
Experimental
27 Reason-Wang/ToolGen

[ICLR 2025] The official implementation of paper "ToolGen: Unified Tool...

23
Experimental
28 blyhm/AgentGym-RL

🤖 Train LLM agents for multi-turn decision-making with AgentGym-RL,...

22
Experimental
29 MichaelvanLaar/proof-of-thought

TypeScript port of https://github.com/DebarghaG/proofofthought by DebarghaG.

22
Experimental
30 NKAI-Decision-Team/HEP-LLM-play-StarCraftII

Hierarchical Expert Prompt for Large-Language-Models: An Approch Defeat...

22
Experimental
31 OpenDFM/ibsen

[ACL 2024] Official code for "IBSEN: Director-Actor Agent Collaboration for...

22
Experimental
32 KevinHaylett/CorpusAncora

Geofinitism: The Geometry of Language and Thought

22
Experimental
33 TobyYang7/TwinMarket

[NeurIPS 2025] A multi-agent framework that leverages LLMs to simulate...

20
Experimental
34 zjunlp/predict-before-execute

Can We Predict Before Executing Machine Learning Agents?

19
Experimental
35 JLanghamLopez/prisoners-dilemma

The Iterated Prisoners Dilemma for LLM Agents

19
Experimental
36 HATS-ICT/PersonaEvolve

[EMNLP 2025 Main] Official Repo for Paper: "Implicit Behavioral Alignment of...

19
Experimental
37 fannie1208/W4S

[COLM2025] "Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors"

18
Experimental
38 liuxiaotong/knowlyr-gym

Gymnasium-style RL framework for LLM agent training — MDP environments,...

17
Experimental
39 iphysresearch/evo-mcts

Official implementation of "Automated Algorithmic Discovery for...

16
Experimental
40 iainjclark/synthetic-anthropology-cognition-lab

Research lab notebook and code for synthetic anthropology experiments using...

15
Experimental
41 papachristoumarios/llm-network-formation

Supplementary Code and Data for "Network Formation and Dynamics among Multi-LLMs"

15
Experimental
42 Tsumugii24/HAMLET

[ICLR 2026] Official code implementation for paper HAMLET: A Hierarchical...

15
Experimental
43 chirindaopensource/bias_adjusted_LLM_agents_human_like_decision_making

End-to-End Python framework implementing bias-adjusted LLM agents for...

11
Experimental
44 yuliu625/Simulate-the-Prisoners-Dilemma-with-Agents

An AutoGen-based simulation framework for the Prisoner's Dilemma. Explore...

11
Experimental
45 Seldre99/HeRoN

Python code to implement HeRoN, a mediated RL–LLM framework to create NPCs...

11
Experimental
46 opendilab/OpenPaL

Building open-ended embodied agent in battle royale FPS game

11
Experimental
47 reveurmichael/space_mining

SpaceMining: a novel RL environment beyond LLM priors

11
Experimental
48 lgy0404/LearnAct

Official code repo for the paper "LearnAct: Few-Shot Mobile GUI Agent with a...

11
Experimental
49 AntonioSabbatellaUni/LLM-Multi-Agent-Optimization-Framework

Official implementation of MALBO (arXiv:2511.11788). Optimizes Multi-Agent...

11
Experimental
50 SachinVarghese/telma

Toolkit Evaluator for Language Model Agents

10
Experimental

Comparisons in this category