lena-lenkeit/llm-adversarial-attacks
Adversarial attacks on LLMs, for influencing outputs of hidden layer linear probes and steering generations.
No commits in the last 6 months.
Stars
2
Forks
—
Language
Python
License
—
Category
Last pushed
May 25, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/lena-lenkeit/llm-adversarial-attacks"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
QData/TextAttack
TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model...
ebagdasa/backdoors101
Backdoors Framework for Deep Learning and Federated Learning. A light-weight tool to conduct...
THUYimingLi/backdoor-learning-resources
A list of backdoor learning resources
zhangzp9970/MIA
Unofficial pytorch implementation of paper: Model Inversion Attacks that Exploit Confidence...
LukasStruppek/Plug-and-Play-Attacks
[ICML 2022 / ICLR 2024] Source code for our papers "Plug & Play Attacks: Towards Robust and...