Direct Preference Optimization Transformer Models
There are 12 direct preference optimization models tracked. The highest-rated is stair-lab/mlhp at 49/100 with 30 stars.
Get all 12 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=direct-preference-optimization&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
stair-lab/mlhp
Machine Learning from Human Preferences |
|
Emerging |
| 2 |
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward |
|
Emerging |
| 3 |
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO) |
|
Emerging |
| 4 |
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for... |
|
Emerging |
| 5 |
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards |
|
Emerging |
| 6 |
JIA-Lab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for... |
|
Experimental |
| 7 |
Meaquadddd/DPO-Shift
DPO-Shift: Shifting the Distribution of Direct Preference Optimization |
|
Experimental |
| 8 |
li-plus/flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code |
|
Experimental |
| 9 |
chrisliu298/llm-unlearn-eco
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts |
|
Experimental |
| 10 |
sahsaeedi/TPO
[TMLR] Triple Preference Optimization |
|
Experimental |
| 11 |
sugarandgugu/Simple-Trl-Training
基于DPO算法微调语言大模型,简单好上手。 |
|
Experimental |
| 12 |
csm9493/efficient-llm-unlearning
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025) |
|
Experimental |