LLM Agent Training Gyms LLM Tools

Gymnasium-style environments and frameworks for training LLM agents through reinforcement learning, multi-turn decision-making, and self-play. Does NOT include general RL frameworks, agent orchestration platforms, or applications using pre-trained agents.

There are 50 llm agent training gyms tools tracked. 3 score above 50 (established tier). The highest-rated is Gen-Verse/LatentMAS at 53/100 with 800 stars. 1 of the top 10 are actively maintained.

Get all 50 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=llm-agent-training-gyms&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	Gen-Verse/LatentMAS Latent Collaboration in Multi-Agent Systems	53	Established	800	Python
2	ai4co/reevo [NeurIPS 2024] ReEvo: Large Language Models as Hyper-Heuristics with...	51	Established	257	Python
3	SALT-NLP/collaborative-gym Framework and toolkits for building and evaluating collaborative agents that...	51	Established	124	Python
4	lean-dojo/LeanCopilot LLMs as Copilots for Theorem Proving in Lean	48	Emerging	1,244	C++
5	sethkarten/LLM-Economist Official repository of the 2025 paper, LLM Economist: Large Population...	47	Emerging	94	Python
6	WooooDyy/AgentGym-RL Code and implementations for the paper "AgentGym-RL: Training LLM Agents for...	46	Emerging	635	Python
7	datphamvn/HSEvo [AAAI-25] HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven...	43	Emerging	33	Python
8	FusionBrainLab/gigaevo-core Evolutionary algorithm that uses Large Language Models (LLMs) to...	43	Emerging	111	Python
9	WooooDyy/AgentGym Code and implementations for the ACL 2025 paper "AgentGym: Evolving Large...	42	Emerging	742	Python
10	GeminiLight/gen-mentor [WWW '25 Oral - GenMentor] Official code of our paper "LLM-powered...	42	Emerging	58	Python
11	proger/haloop Agent toolkit for 100 hours of speech and 10 GiB of text	42	Emerging	14	Python
12	axon-rl/gem A Gym for Agentic LLMs	42	Emerging	462	Python
13	Alibaba-Quark/SSP Search Self-Play: Pushing the Frontier of Agent Capability without Supervision	39	Emerging	97	Python
14	zju-vipa/Odyssey Odyssey: Empowering Minecraft Agents with Open-World Skills	37	Emerging	368	Python
15	spiral-rl/spiral SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent...	36	Emerging	177	Python
16	wellecks/llmstep llmstep: [L]LM proofstep suggestions in Lean 4.	34	Emerging	148	Python
17	zjunlp/MachineSoM [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social...	33	Emerging	120	Python
18	moment-timeseries-foundation-model/TimeSeriesGym Official code for TimeSeriesGym: A Scalable Benchmark for (Time Series)...	33	Emerging	34	Python
19	thu-nics/MARSHAL MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs	32	Emerging	39	Python
20	codezakh/DataEnvGym A testbed for agents and environments that can automatically improve models...	32	Emerging	28	Python
21	atasoglu/toolsgen A modular Python library for synthesizing tool-calling datasets from JSON...	32	Emerging	3	Python
22	bin123apple/InfantAgent [NeurIPS 2025] A multimodal agent that can interact with its own PC in a...	29	Experimental	35	Python
23	wshi83/MedAgentGym [ICLR'26] MedAgentGYM: Training LLM Agents for Code-Based Medical Reasoning at Scale	28	Experimental	84	Python
24	OpenMLRL/LLM_Collab_Code_Generation LLM Collaboration for Code Generation	26	Experimental	2	Python
25	Human-Oriented-ATP/motivated-proof-facilitator A graphical interface that makes it convenient to construct "motivated...	25	Experimental	—	TypeScript
26	hg0428/Mar-PS A Multi-Agent Reasoning Problem Solver. You build teams and they work...	25	Experimental	6	Python
27	Reason-Wang/ToolGen [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool...	23	Experimental	174	Python
28	blyhm/AgentGym-RL 🤖 Train LLM agents for multi-turn decision-making with AgentGym-RL,...	22	Experimental	—	Python
29	MichaelvanLaar/proof-of-thought TypeScript port of https://github.com/DebarghaG/proofofthought by DebarghaG.	22	Experimental	—	TypeScript
30	NKAI-Decision-Team/HEP-LLM-play-StarCraftII Hierarchical Expert Prompt for Large-Language-Models: An Approch Defeat...	22	Experimental	53	Python
31	OpenDFM/ibsen [ACL 2024] Official code for "IBSEN: Director-Actor Agent Collaboration for...	22	Experimental	50	Python
32	KevinHaylett/CorpusAncora Geofinitism: The Geometry of Language and Thought	22	Experimental	—	—
33	TobyYang7/TwinMarket [NeurIPS 2025] A multi-agent framework that leverages LLMs to simulate...	20	Experimental	45	—
34	zjunlp/predict-before-execute Can We Predict Before Executing Machine Learning Agents?	19	Experimental	14	Python
35	JLanghamLopez/prisoners-dilemma The Iterated Prisoners Dilemma for LLM Agents	19	Experimental	—	Python
36	HATS-ICT/PersonaEvolve [EMNLP 2025 Main] Official Repo for Paper: "Implicit Behavioral Alignment of...	19	Experimental	7	C#
37	fannie1208/W4S [COLM2025] "Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors"	18	Experimental	55	Python
38	liuxiaotong/knowlyr-gym Gymnasium-style RL framework for LLM agent training — MDP environments,...	17	Experimental	3	Python
39	iphysresearch/evo-mcts Official implementation of "Automated Algorithmic Discovery for...	16	Experimental	11	Python
40	iainjclark/synthetic-anthropology-cognition-lab Research lab notebook and code for synthetic anthropology experiments using...	15	Experimental	—	Python
41	papachristoumarios/llm-network-formation Supplementary Code and Data for "Network Formation and Dynamics among Multi-LLMs"	15	Experimental	5	Jupyter Notebook
42	Tsumugii24/HAMLET [ICLR 2026] Official code implementation for paper HAMLET: A Hierarchical...	15	Experimental	1	—
43	chirindaopensource/bias_adjusted_LLM_agents_human_like_decision_making End-to-End Python framework implementing bias-adjusted LLM agents for...	11	Experimental	—	Jupyter Notebook
44	yuliu625/Simulate-the-Prisoners-Dilemma-with-Agents An AutoGen-based simulation framework for the Prisoner's Dilemma. Explore...	11	Experimental	—	Python
45	Seldre99/HeRoN Python code to implement HeRoN, a mediated RL–LLM framework to create NPCs...	11	Experimental	—	Python
46	opendilab/OpenPaL Building open-ended embodied agent in battle royale FPS game	11	Experimental	38	—
47	reveurmichael/space_mining SpaceMining: a novel RL environment beyond LLM priors	11	Experimental	—	Python
48	lgy0404/LearnAct Official code repo for the paper "LearnAct: Few-Shot Mobile GUI Agent with a...	11	Experimental	46	Python
49	AntonioSabbatellaUni/LLM-Multi-Agent-Optimization-Framework Official implementation of MALBO (arXiv:2511.11788). Optimizes Multi-Agent...	11	Experimental	—	Jupyter Notebook
50	SachinVarghese/telma Toolkit Evaluator for Language Model Agents	10	Experimental	1	Jupyter Notebook

Comparisons in this category

AgentGym-RL and AgentGym (46 vs 42)