liziniu/cold_start_rl
Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?
This helps AI researchers and practitioners improve the performance of Large Language Models (LLMs) on complex tasks like mathematical reasoning. By applying specialized 'cold-start' fine-tuning methods like GEM or ReMax, it takes an LLM and training data, and produces a more robust LLM that retains output diversity for better subsequent Reinforcement Learning (RL) training. The end-user is typically an AI/ML researcher or engineer working on LLM development and alignment.
No commits in the last 6 months.
Use this if you are fine-tuning LLMs with Reinforcement Learning and find that traditional supervised fine-tuning reduces output diversity, limiting further performance gains.
Not ideal if you are looking for a general-purpose LLM fine-tuning tool or if you are not working specifically on RL-based LLM alignment.
Stars
19
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 09, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/liziniu/cold_start_rl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
agentscope-ai/Trinity-RFT
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement...
OpenRLHF/OpenRLHF
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO &...
zjunlp/EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
huggingface/alignment-handbook
Robust recipes to align language models with human and AI preferences
hyunwoongko/nanoRLHF
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.