RLHF Alignment Training Transformer Models

Tools and frameworks for training language models using reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and related alignment techniques. Includes implementations of RLHF pipelines, preference learning methods, and safety-focused training approaches. Does NOT include general safety evaluation, jailbreak detection, or post-hoc alignment analysis without training components.

There are 106 rlhf alignment training models tracked. 1 score above 70 (verified tier). The highest-rated is agentscope-ai/Trinity-RFT at 72/100 with 557 stars and 1,472 monthly downloads. 3 of the top 10 are actively maintained.

Get all 106 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=rlhf-alignment-training&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	agentscope-ai/Trinity-RFT Trinity-RFT is a general-purpose, flexible and scalable framework designed...	72	Verified	557	Python
2	OpenRLHF/OpenRLHF An Easy-to-use, Scalable and High-performance Agentic RL Framework based on...	69	Established	9,158	Python
3	huggingface/alignment-handbook Robust recipes to align language models with human and AI preferences	61	Established	5,523	Python
4	zjunlp/EasyEdit [ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.	60	Established	2,744	Jupyter Notebook
5	PKU-Alignment/align-anything Align Anything: Training All-modality Model with Feedback	53	Established	4,635	Python
6	hyunwoongko/nanoRLHF nanoRLHF: from-scratch journey into how LLMs and RLHF really work.	50	Established	168	Python
7	hiyouga/ChatGLM-Efficient-Tuning Fine-tuning ChatGLM-6B with PEFT \| 基于 PEFT 的高效 ChatGLM 微调	47	Emerging	3,732	Python
8	opendilab/LightRFT LightRFT: Light, Efficient, Omni-modal & Reward-model Driven Reinforcement...	47	Emerging	208	Python
9	hscspring/hcgf Humanable Chat Generative-model Fine-tuning \| LLM微调	46	Emerging	207	Python
10	PKU-Alignment/safe-rlhf Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from...	44	Emerging	1,590	Python
11	Gen-Verse/dLLM-RL [ICLR 2026] Official code for TraceRL: Revolutionizing post-training for...	44	Emerging	459	Python
12	sinanuozdemir/oreilly-llm-rl-alignment This training offers an intensive exploration into the frontier of...	43	Emerging	59	Jupyter Notebook
13	NVlabs/RLP [ICLR 2026] Official PyTorch Implementation of RLP: Reinforcement as a...	40	Emerging	241	—
14	conceptofmind/LaMDA-rlhf-pytorch Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding...	40	Emerging	470	Python
15	RLHFlow/RLHF-Reward-Modeling Recipes to train reward model for RLHF.	39	Emerging	1,520	Python
16	hiyouga/FastEdit 🩹Editing large language models within 10 seconds⚡	37	Emerging	1,359	Python
17	uclaml/SPIN The official implementation of Self-Play Fine-Tuning (SPIN)	37	Emerging	1,235	Python
18	OPTML-Group/Unlearn-Simple [NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative...	37	Emerging	43	Python
19	tatsu-lab/alpaca_farm A simulation framework for RLHF and alternatives. Develop your RLHF method...	35	Emerging	842	Python
20	GithubX-F/DynaMO-RL Dynamic Rollout Allocation and Advantage Modulation for Policy Optimization...	35	Emerging	86	Python
21	nickduran/align2-linguistic-alignment ALIGN 2.0: Modern Python package for multi-level linguistic alignment...	35	Emerging	4	Python
22	xyjigsaw/LLM-Pretrain-SFT Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)	35	Emerging	87	Python
23	ZinYY/Online_RLHF A PyTorch implementation of the paper "Provably Efficient Online RLHF with...	35	Emerging	89	Python
24	pratyushasharma/laser The Truth Is In There: Improving Reasoning in Language Models with...	34	Emerging	390	Python
25	complex-reasoning/RPG [ICLR 2026] RPG: KL-Regularized Policy Gradient (https://arxiv.org/abs/2505.17508)	33	Emerging	65	Python
26	WayneJin0918/SRUM Official repo of paper "SRUM: Fine-Grained Self-Rewarding for Unified...	33	Emerging	96	Python
27	WangJingyao07/Awesome-GRPO Codebase of GRPO: Implementations and Resources of GRPO and Its Variants	33	Emerging	276	Python
28	l294265421/alpaca-rlhf Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback)...	33	Emerging	117	Python
29	NVlabs/Long-RL Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)	33	Emerging	700	Python
30	RishabSA/interp-refusal-tokens We study whether categorical refusal tokens enable controllable and...	32	Emerging	7	Python
31	jackaduma/Vicuna-LoRA-RLHF-PyTorch A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer...	32	Emerging	221	Python
32	nicola-decao/KnowledgeEditor Code for Editing Factual Knowledge in Language Models	32	Emerging	142	Python
33	daniel-furman/sft-demos Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and...	31	Emerging	77	Jupyter Notebook
34	rosinality/halite Acceleration framework for Human Alignment Learning	31	Emerging	13	Python
35	tomekkorbak/pretraining-with-human-feedback Code accompanying the paper Pretraining Language Models with Human Preferences	31	Emerging	180	Python
36	openpsi-project/ReaLHF Super-Efficient RLHF Training of LLMs with Parameter Reallocation	31	Emerging	333	Python
37	AIFrameResearch/SPO Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL...	30	Emerging	45	Python
38	zjunlp/Mol-Instructions [ICLR 2024] Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset...	30	Emerging	294	Python
39	HKUNLP/icl-ceil [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.	30	Emerging	103	Python
40	jackaduma/ChatGLM-LoRA-RLHF-PyTorch A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer...	29	Experimental	140	Python
41	WooooDyy/BAPO Codes for the paper "BAPO: Stabilizing Off-Policy Reinforcement Learning for...	29	Experimental	91	Python
42	abenechehab/dicl [ICLR 2025] Official implementation of DICL (Disentangled In-Context...	29	Experimental	25	Jupyter Notebook
43	qizhou000/UniEdit [NeurIPS 2025 B & D] UniEdit: A Unified Knowledge Editing Benchmark for...	29	Experimental	2	Python
44	NVlabs/NFT Implementation of Negative-aware Finetuning (NFT) algorithm for "Bridging...	29	Experimental	71	Python
45	kaistAI/Janus [NeurIPS 2024] Train LLMs with diverse system messages reflecting...	29	Experimental	53	Python
46	tlc4418/llm_optimization A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.	29	Experimental	47	Python
47	CLAIRE-Labo/quantile-reward-policy-optimization Official codebase for "Quantile Reward Policy Optimization: Alignment with...	28	Experimental	30	Python
48	TideDra/VL-RLHF A RLHF Infrastructure for Vision-Language Models	28	Experimental	198	Python
49	jackaduma/Alpaca-LoRA-RLHF-PyTorch A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer...	28	Experimental	61	Python
50	holarissun/RewardModelingBeyondBradleyTerry official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models...	27	Experimental	71	Python
51	RLHFlow/Online-RLHF A recipe for online RLHF and online iterative DPO.	27	Experimental	543	Python
52	PKU-Alignment/beavertails BeaverTails is a collection of datasets designed to facilitate research on...	27	Experimental	176	Makefile
53	ZJLAB-AMMI/LLM4Teach Python code to implement LLM4Teach, a policy distillation approach for...	27	Experimental	53	Python
54	yaojin17/Unlearning_LLM [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large...	26	Experimental	66	Python
55	YJiangcm/LTE [ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing	26	Experimental	37	Python
56	aerosta/rewardhackwatch Runtime detector for reward hacking and misalignment in LLM agents (89.7% F1...	26	Experimental	7	Python
57	pangatlo/RL-100 🤖 Implement advanced robotic manipulation techniques using real-world...	25	Experimental	3	Python
58	CJReinforce/PURE Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is...	25	Experimental	160	Python
59	liziniu/policy_optimization Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)	24	Experimental	28	Python
60	NiuTrans/Vision-LLM-Alignment This repository contains the code for SFT, RLHF, and DPO, designed for...	24	Experimental	118	Python
61	seonghyeonye/Flipped-Learning [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models...	24	Experimental	117	Python
62	nlp-uoregon/Okapi Okapi: Instruction-tuned Large Language Models in Multiple Languages with...	24	Experimental	96	Python
63	ksm26/Reinforcement-Learning-from-Human-Feedback Embark on the "Reinforcement Learning from Human Feedback" course and align...	23	Experimental	12	Jupyter Notebook
64	twitter-research/multilingual-alignment-tpp Code for reproducing the paper Improved Multilingual Language Model...	23	Experimental	2	Jupyter Notebook
65	astorfi/LLM-Alignment-Project A comprehensive template for aligning large language models (LLMs) using...	22	Experimental	39	Python
66	liziniu/ReMax Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement...	22	Experimental	201	Python
67	YukinoshitaKaren/Reason-KE [EMNLP 2025 Findings] Robust Knowledge Editing via Explicit Reasoning Chains...	22	Experimental	3	Python
68	InternLM/Spark An official implementation of "SPARK: Synergistic Policy And Reward...	22	Experimental	25	Python
69	Yellow4Submarine7/LLMDoctor 🩺 Token-Level Flow-Guided Preference Optimization for Efficient Test-Time...	21	Experimental	2	Python
70	mintaywon/IF_RLHF Source code for 'Understanding impacts of human feedback via influence functions'	21	Experimental	10	Python
71	haozheji/exact-optimization ICML 2024 - Official Repository for EXO: Towards Efficient Exact...	20	Experimental	56	Python
72	gao-g/prelude Code for the paper "Aligning LLM Agents by Learning Latent Preference from...	20	Experimental	45	Python
73	li-plus/nanoRLHF Train a tiny LLaMA model from scratch to repeat your words using...	20	Experimental	18	Python
74	Manohara-Ai/Reinforcement_Learning_Framework_to_Prevent_Jailbreaks A reinforcement learning-based system designed to detect and prevent...	20	Experimental	1	Python
75	pleiadian53/llm-lab A research sandbox for LLM pretraining, fine-tuning (SFT, DPO, RLHF), and...	19	Experimental	—	Python
76	wangclnlp/DeepSpeed-Chat-Extension This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).	19	Experimental	21	Python
77	RLHF-V/RLHF-V [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from...	18	Experimental	307	Python
78	thinkwee/NOVER [EMNLP-2025] R1-Zero on ANY TASK	18	Experimental	28	Python
79	VoxDroid/llm-wikipedia A project for fine-tuning large language models (LLMs) on curated Wikipedia...	18	Experimental	3	Jupyter Notebook
80	Dylsimple60/RLHF_learn 🤖 Enhance reinforcement learning stability and efficiency with advanced...	17	Experimental	4	Python
81	RUCKBReasoning/CodeRM Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of...	17	Experimental	27	Python
82	313mystery303/vla0-trl 🔍 Explore a minimal reimplementation of VLA-0 with TRL, achieving 90% LIBERO...	17	Experimental	3	Python
83	rafaelvp-db/hf-finetune Fine tuning a GPT model using the Persuasion for Good dataset.	16	Experimental	3	Python
84	bhimanbaghel/ResolveUnderOverEdit Official implementation of "Resolving UnderEdit & OverEdit with Iterative &...	16	Experimental	1	Python
85	5663015/LLMs_train 一套代码指令微调大模型	16	Experimental	39	Python
86	yihedeng9/rlhf-summary-notes A brief and partial summary of RLHF algorithms.	16	Experimental	147	—
87	ssbuild/llm_rlhf realize the reinforcement learning training for gpt2 llama bloom and so on llm model	15	Experimental	27	Python
88	clam004/minichatgpt annotated tutorial of the huggingface TRL repo for reinforcement learning...	15	Experimental	20	Jupyter Notebook
89	SharathHebbar/sft_mathgpt2 Supervised Fine tuning using TRL library	15	Experimental	2	Jupyter Notebook
90	fake-it0628/jailbreak-defense Jailbreak Defense System based on Hidden State Causal Monitoring for LLMs	15	Experimental	1	Python
91	Martin-qyma/TRM From Faithfulness to Correctness: Generative Reward Models that Think Critically	14	Experimental	14	Python
92	kylebrussell/cap-rlvr CAP RLVR: Reinforcement Learning from Human Feedback for Legal Reasoning...	14	Experimental	3	Python
93	PKU-Alignment/llms-resist-alignment [ACL2025 Best Paper] Language Models Resist Alignment	14	Experimental	44	Python
94	ducnh279/Align-LLMs-with-DPO Align a Large Language Model (LLM) with DPO loss	13	Experimental	8	Jupyter Notebook
95	sathishkumar67/GPT-2-IMDB-Sentiment-Fine-Tuning-with-PPO Implemented the Proximal Policy Optimization (PPO) algorithm to fine-tune a...	12	Experimental	4	Python
96	Daddy-Myth/Fine-tuning-Flan-T5-RLHF Aligning FLAN-T5 with Reinforcement Learning from Human Feedback (RLHF) for...	12	Experimental	1	Jupyter Notebook
97	closestfriend/efficient-domain-adaptation Research repository for Brie: LLM-assisted data authoring methodology...	12	Experimental	1	Python
98	Yousifus/rlhf_loop_humain RLHF Loop System - Learning project with monitoring dashboard, drift...	12	Experimental	1	Python
99	balnarendrasapa/faq-llm This is course project for DSCI 6004 deals with fine-tuning a pretrained...	12	Experimental	4	Jupyter Notebook
100	rxian/domain-alignment Code for importance-weighted domain alignment, and the paper “Cross-Lingual...	12	Experimental	3	Python
101	pradeepiyer/nothing-gpt SFT + DPO fine tuned model about Nothing.	11	Experimental	—	Python
102	ma-spie/LLM_metaphor_detection Repository for the paper "Literary Metaphor Detection with LLM Fine-Tuning...	11	Experimental	2	Jupyter Notebook
103	cluebbers/dpo-rlhf-paraphrase-types Enhancing paraphrase-type generation using Direct Preference Optimization...	11	Experimental	—	Jupyter Notebook
104	fabiantoh98/llm-preference-learning End-to-end LLM preference learning pipeline: training, evaluation, and...	11	Experimental	—	Python
105	pladee42/email-dpo-agents A comprehensive research framework for automated fundraising email...	11	Experimental	—	Python
106	Jason-Wang313/Drift-Bench Quantifying the "Safety Half-Life" of LLMs: A framework to measure how...	11	Experimental	—	Python

Comparisons in this category

Trinity-RFT and LightRFT (72 vs 47)