PRIME-RL/PRIME
Scalable RL solution for advanced reasoning of language models
Implements online RL with implicit process reward models (PRMs) that learn dense, token-level rewards directly from outcome labels without requiring step-level annotations. The approach jointly trains a policy and PRM initialized from the same SFT model, using RLOO advantage estimation to combine outcome and process rewards for PPO updates. Integrated with veRL framework and optimized for math and coding reasoning tasks.
1,813 stars. No commits in the last 6 months.
Stars
1,813
Forks
104
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/PRIME-RL/PRIME"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
open-thought/reasoning-gym
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Hmbown/Hegelion
Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)
LLM360/Reasoning360
A repo for open research on building large reasoning models
bowang-lab/BioReason
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25
TsinghuaC3I/Awesome-RL-for-LRMs
A Survey of Reinforcement Learning for Large Reasoning Models