Direct Preference Optimization Transformer Models

There are 12 direct preference optimization models tracked. The highest-rated is stair-lab/mlhp at 49/100 with 30 stars.

Get all 12 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=direct-preference-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	stair-lab/mlhp Machine Learning from Human Preferences	49	Emerging	30	TeX
2	princeton-nlp/SimPO [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward	43	Emerging	946	Python
3	uclaml/SPPO The official implementation of Self-Play Preference Optimization (SPPO)	42	Emerging	583	Python
4	general-preference/general-preference-model [ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for...	36	Emerging	39	Python
5	sail-sg/dice Official implementation of Bootstrapping Language Models via DPO Implicit Rewards	33	Emerging	47	Python
6	JIA-Lab-research/Step-DPO Implementation for "Step-DPO: Step-wise Preference Optimization for...	28	Experimental	392	Python
7	Meaquadddd/DPO-Shift DPO-Shift: Shifting the Distribution of Direct Preference Optimization	27	Experimental	59	Python
8	li-plus/flash-preference Accelerate LLM preference tuning via prefix sharing with a single line of code	26	Experimental	51	Python
9	chrisliu298/llm-unlearn-eco [NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts	25	Experimental	38	Python
10	sahsaeedi/TPO [TMLR] Triple Preference Optimization	23	Experimental	30	Python
11	sugarandgugu/Simple-Trl-Training 基于DPO算法微调语言大模型，简单好上手。	23	Experimental	51	Python
12	csm9493/efficient-llm-unlearning Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025)	20	Experimental	13	Python