RLHF Alignment Training Transformer Models

Tools and frameworks for training language models using reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and related alignment techniques. Includes implementations of RLHF pipelines, preference learning methods, and safety-focused training approaches. Does NOT include general safety evaluation, jailbreak detection, or post-hoc alignment analysis without training components.

There are 106 rlhf alignment training models tracked. 1 score above 70 (verified tier). The highest-rated is agentscope-ai/Trinity-RFT at 72/100 with 557 stars and 1,472 monthly downloads. 3 of the top 10 are actively maintained.

Get all 106 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=rlhf-alignment-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed...

72
Verified
2 OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on...

69
Established
3 huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

61
Established
4 zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

60
Established
5 PKU-Alignment/align-anything

Align Anything: Training All-modality Model with Feedback

53
Established
6 hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

50
Established
7 hiyouga/ChatGLM-Efficient-Tuning

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

47
Emerging
8 opendilab/LightRFT

LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement...

47
Emerging
9 hscspring/hcgf

Humanable Chat Generative-model Fine-tuning | LLM微调

46
Emerging
10 PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from...

44
Emerging
11 Gen-Verse/dLLM-RL

[ICLR 2026] Official code for TraceRL: Revolutionizing post-training for...

44
Emerging
12 sinanuozdemir/oreilly-llm-rl-alignment

This training offers an intensive exploration into the frontier of...

43
Emerging
13 NVlabs/RLP

[ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a...

40
Emerging
14 conceptofmind/LaMDA-rlhf-pytorch

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding...

40
Emerging
15 RLHFlow/RLHF-Reward-Modeling

Recipes to train reward model for RLHF.

39
Emerging
16 hiyouga/FastEdit

🩹Editing large language models within 10 seconds⚡

37
Emerging
17 uclaml/SPIN

The official implementation of Self-Play Fine-Tuning (SPIN)

37
Emerging
18 OPTML-Group/Unlearn-Simple

[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative...

37
Emerging
19 tatsu-lab/alpaca_farm

A simulation framework for RLHF and alternatives. Develop your RLHF method...

35
Emerging
20 GithubX-F/DynaMO-RL

Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization...

35
Emerging
21 nickduran/align2-linguistic-alignment

ALIGN 2.0: Modern Python package for multi-level linguistic alignment...

35
Emerging
22 xyjigsaw/LLM-Pretrain-SFT

Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)

35
Emerging
23 ZinYY/Online_RLHF

A PyTorch implementation of the paper "Provably Efficient Online RLHF with...

35
Emerging
24 pratyushasharma/laser

The Truth Is In There: Improving Reasoning in Language Models with...

34
Emerging
25 complex-reasoning/RPG

[ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)

33
Emerging
26 WayneJin0918/SRUM

Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified...

33
Emerging
27 WangJingyao07/Awesome-GRPO

Codebase of GRPO: Implementations and Resources of GRPO and Its Variants

33
Emerging
28 l294265421/alpaca-rlhf

Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback)...

33
Emerging
29 NVlabs/Long-RL

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

33
Emerging
30 RishabSA/interp-refusal-tokens

We study whether categorical refusal tokens enable controllable and...

32
Emerging
31 jackaduma/Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer...

32
Emerging
32 nicola-decao/KnowledgeEditor

Code for Editing Factual Knowledge in Language Models

32
Emerging
33 daniel-furman/sft-demos

Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and...

31
Emerging
34 rosinality/halite

Acceleration framework for Human Alignment Learning

31
Emerging
35 tomekkorbak/pretraining-with-human-feedback

Code accompanying the paper Pretraining Language Models with Human Preferences

31
Emerging
36 openpsi-project/ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

31
Emerging
37 AIFrameResearch/SPO

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL...

30
Emerging
38 zjunlp/Mol-Instructions

[ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset...

30
Emerging
39 HKUNLP/icl-ceil

[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.

30
Emerging
40 jackaduma/ChatGLM-LoRA-RLHF-PyTorch

A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer...

29
Experimental
41 WooooDyy/BAPO

Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for...

29
Experimental
42 abenechehab/dicl

[ICLR 2025] Official implementation of DICL (Disentangled In-Context...

29
Experimental
43 qizhou000/UniEdit

[NeurIPS 2025 B & D] UniEdit: A Unified Knowledge Editing Benchmark for...

29
Experimental
44 NVlabs/NFT

Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging...

29
Experimental
45 kaistAI/Janus

[NeurIPS 2024] Train LLMs with diverse system messages reflecting...

29
Experimental
46 tlc4418/llm_optimization

A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.

29
Experimental
47 CLAIRE-Labo/quantile-reward-policy-optimization

Official codebase for "Quantile Reward Policy Optimization: Alignment with...

28
Experimental
48 TideDra/VL-RLHF

A RLHF Infrastructure for Vision-Language Models

28
Experimental
49 jackaduma/Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer...

28
Experimental
50 holarissun/RewardModelingBeyondBradleyTerry

official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models...

27
Experimental
51 RLHFlow/Online-RLHF

A recipe for online RLHF and online iterative DPO.

27
Experimental
52 PKU-Alignment/beavertails

BeaverTails is a collection of datasets designed to facilitate research on...

27
Experimental
53 ZJLAB-AMMI/LLM4Teach

Python code to implement LLM4Teach, a policy distillation approach for...

27
Experimental
54 yaojin17/Unlearning_LLM

[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large...

26
Experimental
55 YJiangcm/LTE

[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing

26
Experimental
56 aerosta/rewardhackwatch

Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1...

26
Experimental
57 pangatlo/RL-100

🤖 Implement advanced robotic manipulation techniques using real-world...

25
Experimental
58 CJReinforce/PURE

Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is...

25
Experimental
59 liziniu/policy_optimization

Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)

24
Experimental
60 NiuTrans/Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for...

24
Experimental
61 seonghyeonye/Flipped-Learning

[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models...

24
Experimental
62 nlp-uoregon/Okapi

Okapi: Instruction-tuned Large Language Models in Multiple Languages with...

24
Experimental
63 ksm26/Reinforcement-Learning-from-Human-Feedback

Embark on the "Reinforcement Learning from Human Feedback" course and align...

23
Experimental
64 twitter-research/multilingual-alignment-tpp

Code for reproducing the paper Improved Multilingual Language Model...

23
Experimental
65 astorfi/LLM-Alignment-Project

A comprehensive template for aligning large language models (LLMs) using...

22
Experimental
66 liziniu/ReMax

Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement...

22
Experimental
67 YukinoshitaKaren/Reason-KE

[EMNLP 2025 Findings] Robust Knowledge Editing via Explicit Reasoning Chains...

22
Experimental
68 InternLM/Spark

An official implementation of "SPARK: Synergistic Policy And Reward...

22
Experimental
69 Yellow4Submarine7/LLMDoctor

🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time...

21
Experimental
70 mintaywon/IF_RLHF

Source code for 'Understanding impacts of human feedback via influence functions'

21
Experimental
71 haozheji/exact-optimization

ICML 2024 - Official Repository for EXO: Towards Efficient Exact...

20
Experimental
72 gao-g/prelude

Code for the paper "Aligning LLM Agents by Learning Latent Preference from...

20
Experimental
73 li-plus/nanoRLHF

Train a tiny LLaMA model from scratch to repeat your words using...

20
Experimental
74 Manohara-Ai/Reinforcement_Learning_Framework_to_Prevent_Jailbreaks

A reinforcement learning-based system designed to detect and prevent...

20
Experimental
75 pleiadian53/llm-lab

A research sandbox for LLM pretraining, fine-tuning (SFT, DPO, RLHF), and...

19
Experimental
76 wangclnlp/DeepSpeed-Chat-Extension

This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).

19
Experimental
77 RLHF-V/RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from...

18
Experimental
78 thinkwee/NOVER

[EMNLP-2025] R1-Zero on ANY TASK

18
Experimental
79 VoxDroid/llm-wikipedia

A project for fine-tuning large language models (LLMs) on curated Wikipedia...

18
Experimental
80 Dylsimple60/RLHF_learn

🤖 Enhance reinforcement learning stability and efficiency with advanced...

17
Experimental
81 RUCKBReasoning/CodeRM

Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of...

17
Experimental
82 313mystery303/vla0-trl

🔍 Explore a minimal reimplementation of VLA-0 with TRL, achieving 90% LIBERO...

17
Experimental
83 rafaelvp-db/hf-finetune

Fine tuning a GPT model using the Persuasion for Good dataset.

16
Experimental
84 bhimanbaghel/ResolveUnderOverEdit

Official implementation of "Resolving UnderEdit & OverEdit with Iterative &...

16
Experimental
85 5663015/LLMs_train

一套代码指令微调大模型

16
Experimental
86 yihedeng9/rlhf-summary-notes

A brief and partial summary of RLHF algorithms.

16
Experimental
87 ssbuild/llm_rlhf

realize the reinforcement learning training for gpt2 llama bloom and so on llm model

15
Experimental
88 clam004/minichatgpt

annotated tutorial of the huggingface TRL repo for reinforcement learning...

15
Experimental
89 SharathHebbar/sft_mathgpt2

Supervised Fine tuning using TRL library

15
Experimental
90 fake-it0628/jailbreak-defense

Jailbreak Defense System based on Hidden State Causal Monitoring for LLMs

15
Experimental
91 Martin-qyma/TRM

From Faithfulness to Correctness: Generative Reward Models that Think Critically

14
Experimental
92 kylebrussell/cap-rlvr

CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning...

14
Experimental
93 PKU-Alignment/llms-resist-alignment

[ACL2025 Best Paper] Language Models Resist Alignment

14
Experimental
94 ducnh279/Align-LLMs-with-DPO

Align a Large Language Model (LLM) with DPO loss

13
Experimental
95 sathishkumar67/GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO

Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a...

12
Experimental
96 Daddy-Myth/Fine-tuning-Flan-T5-RLHF

Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for...

12
Experimental
97 closestfriend/efficient-domain-adaptation

Research repository for Brie: LLM-assisted data authoring methodology...

12
Experimental
98 Yousifus/rlhf_loop_humain

RLHF Loop System - Learning project with monitoring dashboard, drift...

12
Experimental
99 balnarendrasapa/faq-llm

This is course project for DSCI 6004 deals with fine-tuning a pretrained...

12
Experimental
100 rxian/domain-alignment

Code for importance-weighted domain alignment, and the paper “Cross-Lingual...

12
Experimental
101 pradeepiyer/nothing-gpt

SFT + DPO fine tuned model about Nothing.

11
Experimental
102 ma-spie/LLM_metaphor_detection

Repository for the paper "Literary Metaphor Detection with LLM Fine-Tuning...

11
Experimental
103 cluebbers/dpo-rlhf-paraphrase-types

Enhancing paraphrase-type generation using Direct Preference Optimization...

11
Experimental
104 fabiantoh98/llm-preference-learning

End-to-end LLM preference learning pipeline: training, evaluation, and...

11
Experimental
105 pladee42/email-dpo-agents

A comprehensive research framework for automated fundraising email...

11
Experimental
106 Jason-Wang313/Drift-Bench

Quantifying the "Safety Half-Life" of LLMs: A framework to measure how...

11
Experimental

Comparisons in this category