git-disl/Antidote

This is the unofficial re-implementation of "Antidote: Post-fine-tuning Safety Alignment for Large Language Models against Harmful Fine-tuning Attack" (ICML2025)

14
/ 100
Experimental

This project helps maintain the safety of large language models (LLMs) after they've been customized. It takes an LLM that might have learned harmful behaviors from user-provided fine-tuning data and removes those harmful parameters. The target user is anyone responsible for deploying and managing safe, customized LLMs for end-users, especially in 'fine-tuning-as-a-service' scenarios.

No commits in the last 6 months.

Use this if you are concerned that fine-tuning an LLM with user-provided data might accidentally or intentionally introduce harmful biases or responses.

Not ideal if you are looking for methods to prevent harmful fine-tuning during the initial alignment or fine-tuning stages, as Antidote is applied *after* fine-tuning.

LLM-safety AI-governance model-alignment content-moderation ethical-AI
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 4 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

8

Forks

Language

Shell

License

Last pushed

Jul 14, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/git-disl/Antidote"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.