line/sacpo
[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)
This project helps AI developers and researchers refine large language models (LLMs) to be both helpful and safe. It takes an existing LLM and training datasets focused on helpfulness and safety, then outputs fine-tuned models and evaluations, showing how well the model adheres to both helpfulness and safety constraints. It's ideal for those building conversational AI, chatbots, or assistants where output quality and user safety are critical.
No commits in the last 6 months.
Use this if you are developing or fine-tuning large language models and need to systematically improve their helpfulness while rigorously enforcing safety guidelines.
Not ideal if you are a business user looking for a no-code solution to customize an LLM, as this requires technical expertise in machine learning and Python.
Stars
8
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/line/sacpo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stair-lab/mlhp
Machine Learning from Human Preferences
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards