JIA-Lab-research/TGDPO

[ICML 2025] TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct Preference Optimization

/ 100

Experimental

This project offers a method to significantly improve the performance of large language models (LLMs) by integrating detailed feedback during their training. It takes existing LLM training data and pre-trained token-level reward models as input, producing an enhanced LLM that generates higher-quality text. This is designed for AI researchers and machine learning engineers who are actively working on fine-tuning and optimizing LLMs.

No commits in the last 6 months.

Use this if you are a researcher or engineer looking to boost the response quality and win rates of your fine-tuned large language models by leveraging token-level guidance.

Not ideal if you are looking for a plug-and-play solution for basic LLM deployment or if you do not have access to significant computational resources (like multiple high-end GPUs).

large-language-models LLM-fine-tuning reinforcement-learning-from-human-feedback AI-model-optimization natural-language-generation

No License Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 7 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights