line/sacpo

[NeurIPS 2024] SACPO (Stepwise Alignment for Constrained Policy Optimization)

/ 100

Experimental

This project helps AI developers and researchers refine large language models (LLMs) to be both helpful and safe. It takes an existing LLM and training datasets focused on helpfulness and safety, then outputs fine-tuned models and evaluations, showing how well the model adheres to both helpfulness and safety constraints. It's ideal for those building conversational AI, chatbots, or assistants where output quality and user safety are critical.

No commits in the last 6 months.

Use this if you are developing or fine-tuning large language models and need to systematically improve their helpfulness while rigorously enforcing safety guidelines.

Not ideal if you are a business user looking for a no-code solution to customize an LLM, as this requires technical expertise in machine learning and Python.

large-language-models ai-safety model-alignment natural-language-processing conversational-ai

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights