Direct Preference Optimization Transformer Models

There are 12 direct preference optimization models tracked. The highest-rated is stair-lab/mlhp at 49/100 with 30 stars.

Get all 12 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=direct-preference-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 stair-lab/mlhp

Machine Learning from Human Preferences

49
Emerging
2 princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

43
Emerging
3 uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

42
Emerging
4 general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for...

36
Emerging
5 sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

33
Emerging
6 JIA-Lab-research/Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for...

28
Experimental
7 Meaquadddd/DPO-Shift

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

27
Experimental
8 li-plus/flash-preference

Accelerate LLM preference tuning via prefix sharing with a single line of code

26
Experimental
9 chrisliu298/llm-unlearn-eco

[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts

25
Experimental
10 sahsaeedi/TPO

[TMLR] Triple Preference Optimization

23
Experimental
11 sugarandgugu/Simple-Trl-Training

基于DPO算法微调语言大模型,简单好上手。

23
Experimental
12 csm9493/efficient-llm-unlearning

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025)

20
Experimental