liziniu/cold_start_rl

Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?

/ 100

Experimental

This helps AI researchers and practitioners improve the performance of Large Language Models (LLMs) on complex tasks like mathematical reasoning. By applying specialized 'cold-start' fine-tuning methods like GEM or ReMax, it takes an LLM and training data, and produces a more robust LLM that retains output diversity for better subsequent Reinforcement Learning (RL) training. The end-user is typically an AI/ML researcher or engineer working on LLM development and alignment.

No commits in the last 6 months.

Use this if you are fine-tuning LLMs with Reinforcement Learning and find that traditional supervised fine-tuning reduces output diversity, limiting further performance gains.

Not ideal if you are looking for a general-purpose LLM fine-tuning tool or if you are not working specifically on RL-based LLM alignment.

LLM fine-tuning Reinforcement Learning (RL) AI model alignment natural language processing machine learning research

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

agentscope-ai/Trinity-RFT

Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...

OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...

zjunlp/EasyEdit

[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.

huggingface/alignment-handbook

Robust recipes to align language models with human and AI preferences

hyunwoongko/nanoRLHF

nanoRLHF: from-scratch journey into how LLMs and RLHF really work.

Explore Transformer Models

All categories Trending Transformer directory Insights