Rahulkumar010/microDPO
microDPO: A minimalist, pure PyTorch implementation of Direct Preference Optimization. Inspired by nanoGPT, it strips away massive RLHF libraries to reveal the elegant math behind AI alignment. Demystify how LLMs learn human preferences with a single, highly readable file. Train a tiny aligned model on your laptop in minutes.
Stars
1
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Rahulkumar010/microDPO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
codelion/pts
Pivotal Token Search
RLHFlow/Directional-Preference-Alignment
Directional Preference Alignment
dannylee1020/openpo
Building synthetic data for preference tuning
DtYXs/Pre-DPO
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
pspdada/Uni-DPO
[ICLR 2026] Official repository of "Uni-DPO: A Unified Paradigm for Dynamic Preference...