boyiwei/alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
No commits in the last 6 months.
Stars
89
Forks
17
Language
Python
License
MIT
Category
Last pushed
Mar 30, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/boyiwei/alignment-attribution-code"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jianghoucheng/AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
steering-vectors/steering-vectors
Steering vectors for transformer language models in Pytorch / Huggingface
kmeng01/memit
Mass-editing thousands of facts into a transformer memory (ICLR 2023)
jianghoucheng/AnyEdit
AnyEdit: Edit Any Knowledge Encoded in Language Models, ICML 2025
zjunlp/KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers