PRIME-RL/PRIME

Scalable RL solution for advanced reasoning of language models

/ 100

Emerging

Implements online RL with implicit process reward models (PRMs) that learn dense, token-level rewards directly from outcome labels without requiring step-level annotations. The approach jointly trains a policy and PRM initialized from the same SFT model, using RLOO advantage estimation to combine outcome and process rewards for PPO updates. Integrated with veRL framework and optimized for math and coding reasoning tasks.

1,813 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

1,813

Forks

104

Language

Python

License

Apache-2.0

Higher-rated alternatives

open-thought/reasoning-gym

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Hmbown/Hegelion

Dialectical reasoning architecture for LLMs (Thesis → Antithesis → Synthesis)

LLM360/Reasoning360

A repo for open research on building large reasoning models

bowang-lab/BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model | NeurIPS '25

TsinghuaC3I/Awesome-RL-for-LRMs

A Survey of Reinforcement Learning for Large Reasoning Models

Explore LLM Tools

All categories Trending LLM Tool directory Insights