RLHF Alignment Training Transformer Models
Tools and frameworks for training language models using reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and related alignment techniques. Includes implementations of RLHF pipelines, preference learning methods, and safety-focused training approaches. Does NOT include general safety evaluation, jailbreak detection, or post-hoc alignment analysis without training components.
There are 106 rlhf alignment training models tracked. 1 score above 70 (verified tier). The highest-rated is agentscope-ai/Trinity-RFT at 72/100 with 557 stars and 1,472 monthly downloads. 3 of the top 10 are actively maintained.
Get all 106 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=rlhf-alignment-training&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed... |
|
Verified |
| 2 |
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on... |
|
Established |
| 3 |
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences |
|
Established |
| 4 |
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs. |
|
Established |
| 5 |
PKU-Alignment/align-anything
Align Anything: Training All-modality Model with Feedback |
|
Established |
| 6 |
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work. |
|
Established |
| 7 |
hiyouga/ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调 |
|
Emerging |
| 8 |
opendilab/LightRFT
LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement... |
|
Emerging |
| 9 |
hscspring/hcgf
Humanable Chat Generative-model Fine-tuning | LLM微调 |
|
Emerging |
| 10 |
PKU-Alignment/safe-rlhf
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from... |
|
Emerging |
| 11 |
Gen-Verse/dLLM-RL
[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for... |
|
Emerging |
| 12 |
sinanuozdemir/oreilly-llm-rl-alignment
This training offers an intensive exploration into the frontier of... |
|
Emerging |
| 13 |
NVlabs/RLP
[ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a... |
|
Emerging |
| 14 |
conceptofmind/LaMDA-rlhf-pytorch
Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding... |
|
Emerging |
| 15 |
RLHFlow/RLHF-Reward-Modeling
Recipes to train reward model for RLHF. |
|
Emerging |
| 16 |
hiyouga/FastEdit
🩹Editing large language models within 10 seconds⚡ |
|
Emerging |
| 17 |
uclaml/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN) |
|
Emerging |
| 18 |
OPTML-Group/Unlearn-Simple
[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative... |
|
Emerging |
| 19 |
tatsu-lab/alpaca_farm
A simulation framework for RLHF and alternatives. Develop your RLHF method... |
|
Emerging |
| 20 |
GithubX-F/DynaMO-RL
Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization... |
|
Emerging |
| 21 |
nickduran/align2-linguistic-alignment
ALIGN 2.0: Modern Python package for multi-level linguistic alignment... |
|
Emerging |
| 22 |
xyjigsaw/LLM-Pretrain-SFT
Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed) |
|
Emerging |
| 23 |
ZinYY/Online_RLHF
A PyTorch implementation of the paper "Provably Efficient Online RLHF with... |
|
Emerging |
| 24 |
pratyushasharma/laser
The Truth Is In There: Improving Reasoning in Language Models with... |
|
Emerging |
| 25 |
complex-reasoning/RPG
[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508) |
|
Emerging |
| 26 |
WayneJin0918/SRUM
Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified... |
|
Emerging |
| 27 |
WangJingyao07/Awesome-GRPO
Codebase of GRPO: Implementations and Resources of GRPO and Its Variants |
|
Emerging |
| 28 |
l294265421/alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback)... |
|
Emerging |
| 29 |
NVlabs/Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025) |
|
Emerging |
| 30 |
RishabSA/interp-refusal-tokens
We study whether categorical refusal tokens enable controllable and... |
|
Emerging |
| 31 |
jackaduma/Vicuna-LoRA-RLHF-PyTorch
A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer... |
|
Emerging |
| 32 |
nicola-decao/KnowledgeEditor
Code for Editing Factual Knowledge in Language Models |
|
Emerging |
| 33 |
daniel-furman/sft-demos
Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and... |
|
Emerging |
| 34 |
rosinality/halite
Acceleration framework for Human Alignment Learning |
|
Emerging |
| 35 |
tomekkorbak/pretraining-with-human-feedback
Code accompanying the paper Pretraining Language Models with Human Preferences |
|
Emerging |
| 36 |
openpsi-project/ReaLHF
Super-Efficient RLHF Training of LLMs with Parameter Reallocation |
|
Emerging |
| 37 |
AIFrameResearch/SPO
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL... |
|
Emerging |
| 38 |
zjunlp/Mol-Instructions
[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset... |
|
Emerging |
| 39 |
HKUNLP/icl-ceil
[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”. |
|
Emerging |
| 40 |
jackaduma/ChatGLM-LoRA-RLHF-PyTorch
A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer... |
|
Experimental |
| 41 |
WooooDyy/BAPO
Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for... |
|
Experimental |
| 42 |
abenechehab/dicl
[ICLR 2025] Official implementation of DICL (Disentangled In-Context... |
|
Experimental |
| 43 |
qizhou000/UniEdit
[NeurIPS 2025 B & D] UniEdit: A Unified Knowledge Editing Benchmark for... |
|
Experimental |
| 44 |
NVlabs/NFT
Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging... |
|
Experimental |
| 45 |
kaistAI/Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting... |
|
Experimental |
| 46 |
tlc4418/llm_optimization
A repo for RLHF training and BoN over LLMs, with support for reward model ensembles. |
|
Experimental |
| 47 |
CLAIRE-Labo/quantile-reward-policy-optimization
Official codebase for "Quantile Reward Policy Optimization: Alignment with... |
|
Experimental |
| 48 |
TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models |
|
Experimental |
| 49 |
jackaduma/Alpaca-LoRA-RLHF-PyTorch
A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer... |
|
Experimental |
| 50 |
holarissun/RewardModelingBeyondBradleyTerry
official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models... |
|
Experimental |
| 51 |
RLHFlow/Online-RLHF
A recipe for online RLHF and online iterative DPO. |
|
Experimental |
| 52 |
PKU-Alignment/beavertails
BeaverTails is a collection of datasets designed to facilitate research on... |
|
Experimental |
| 53 |
ZJLAB-AMMI/LLM4Teach
Python code to implement LLM4Teach, a policy distillation approach for... |
|
Experimental |
| 54 |
yaojin17/Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large... |
|
Experimental |
| 55 |
YJiangcm/LTE
[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing |
|
Experimental |
| 56 |
aerosta/rewardhackwatch
Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1... |
|
Experimental |
| 57 |
pangatlo/RL-100
🤖 Implement advanced robotic manipulation techniques using real-world... |
|
Experimental |
| 58 |
CJReinforce/PURE
Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is... |
|
Experimental |
| 59 |
liziniu/policy_optimization
Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data) |
|
Experimental |
| 60 |
NiuTrans/Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for... |
|
Experimental |
| 61 |
seonghyeonye/Flipped-Learning
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models... |
|
Experimental |
| 62 |
nlp-uoregon/Okapi
Okapi: Instruction-tuned Large Language Models in Multiple Languages with... |
|
Experimental |
| 63 |
ksm26/Reinforcement-Learning-from-Human-Feedback
Embark on the "Reinforcement Learning from Human Feedback" course and align... |
|
Experimental |
| 64 |
twitter-research/multilingual-alignment-tpp
Code for reproducing the paper Improved Multilingual Language Model... |
|
Experimental |
| 65 |
astorfi/LLM-Alignment-Project
A comprehensive template for aligning large language models (LLMs) using... |
|
Experimental |
| 66 |
liziniu/ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement... |
|
Experimental |
| 67 |
YukinoshitaKaren/Reason-KE
[EMNLP 2025 Findings] Robust Knowledge Editing via Explicit Reasoning Chains... |
|
Experimental |
| 68 |
InternLM/Spark
An official implementation of "SPARK: Synergistic Policy And Reward... |
|
Experimental |
| 69 |
Yellow4Submarine7/LLMDoctor
🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time... |
|
Experimental |
| 70 |
mintaywon/IF_RLHF
Source code for 'Understanding impacts of human feedback via influence functions' |
|
Experimental |
| 71 |
haozheji/exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact... |
|
Experimental |
| 72 |
gao-g/prelude
Code for the paper "Aligning LLM Agents by Learning Latent Preference from... |
|
Experimental |
| 73 |
li-plus/nanoRLHF
Train a tiny LLaMA model from scratch to repeat your words using... |
|
Experimental |
| 74 |
Manohara-Ai/Reinforcement_Learning_Framework_to_Prevent_Jailbreaks
A reinforcement learning-based system designed to detect and prevent... |
|
Experimental |
| 75 |
pleiadian53/llm-lab
A research sandbox for LLM pretraining, fine-tuning (SFT, DPO, RLHF), and... |
|
Experimental |
| 76 |
wangclnlp/DeepSpeed-Chat-Extension
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF). |
|
Experimental |
| 77 |
RLHF-V/RLHF-V
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from... |
|
Experimental |
| 78 |
thinkwee/NOVER
[EMNLP-2025] R1-Zero on ANY TASK |
|
Experimental |
| 79 |
VoxDroid/llm-wikipedia
A project for fine-tuning large language models (LLMs) on curated Wikipedia... |
|
Experimental |
| 80 |
Dylsimple60/RLHF_learn
🤖 Enhance reinforcement learning stability and efficiency with advanced... |
|
Experimental |
| 81 |
RUCKBReasoning/CodeRM
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of... |
|
Experimental |
| 82 |
313mystery303/vla0-trl
🔍 Explore a minimal reimplementation of VLA-0 with TRL, achieving 90% LIBERO... |
|
Experimental |
| 83 |
rafaelvp-db/hf-finetune
Fine tuning a GPT model using the Persuasion for Good dataset. |
|
Experimental |
| 84 |
bhimanbaghel/ResolveUnderOverEdit
Official implementation of "Resolving UnderEdit & OverEdit with Iterative &... |
|
Experimental |
| 85 |
5663015/LLMs_train
一套代码指令微调大模型 |
|
Experimental |
| 86 |
yihedeng9/rlhf-summary-notes
A brief and partial summary of RLHF algorithms. |
|
Experimental |
| 87 |
ssbuild/llm_rlhf
realize the reinforcement learning training for gpt2 llama bloom and so on llm model |
|
Experimental |
| 88 |
clam004/minichatgpt
annotated tutorial of the huggingface TRL repo for reinforcement learning... |
|
Experimental |
| 89 |
SharathHebbar/sft_mathgpt2
Supervised Fine tuning using TRL library |
|
Experimental |
| 90 |
fake-it0628/jailbreak-defense
Jailbreak Defense System based on Hidden State Causal Monitoring for LLMs |
|
Experimental |
| 91 |
Martin-qyma/TRM
From Faithfulness to Correctness: Generative Reward Models that Think Critically |
|
Experimental |
| 92 |
kylebrussell/cap-rlvr
CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning... |
|
Experimental |
| 93 |
PKU-Alignment/llms-resist-alignment
[ACL2025 Best Paper] Language Models Resist Alignment |
|
Experimental |
| 94 |
ducnh279/Align-LLMs-with-DPO
Align a Large Language Model (LLM) with DPO loss |
|
Experimental |
| 95 |
sathishkumar67/GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO
Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a... |
|
Experimental |
| 96 |
Daddy-Myth/Fine-tuning-Flan-T5-RLHF
Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for... |
|
Experimental |
| 97 |
closestfriend/efficient-domain-adaptation
Research repository for Brie: LLM-assisted data authoring methodology... |
|
Experimental |
| 98 |
Yousifus/rlhf_loop_humain
RLHF Loop System - Learning project with monitoring dashboard, drift... |
|
Experimental |
| 99 |
balnarendrasapa/faq-llm
This is course project for DSCI 6004 deals with fine-tuning a pretrained... |
|
Experimental |
| 100 |
rxian/domain-alignment
Code for importance-weighted domain alignment, and the paper “Cross-Lingual... |
|
Experimental |
| 101 |
pradeepiyer/nothing-gpt
SFT + DPO fine tuned model about Nothing. |
|
Experimental |
| 102 |
ma-spie/LLM_metaphor_detection
Repository for the paper "Literary Metaphor Detection with LLM Fine-Tuning... |
|
Experimental |
| 103 |
cluebbers/dpo-rlhf-paraphrase-types
Enhancing paraphrase-type generation using Direct Preference Optimization... |
|
Experimental |
| 104 |
fabiantoh98/llm-preference-learning
End-to-end LLM preference learning pipeline: training, evaluation, and... |
|
Experimental |
| 105 |
pladee42/email-dpo-agents
A comprehensive research framework for automated fundraising email... |
|
Experimental |
| 106 |
Jason-Wang313/Drift-Bench
Quantifying the "Safety Half-Life" of LLMs: A framework to measure how... |
|
Experimental |