HKU-TASR/Imperio
[IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the victim model's prediction for arbitrary targets.
This project helps security researchers and AI auditors understand a new type of vulnerability in machine learning models, specifically in image classification. It takes a clean image dataset and, using language-guided instructions, trains a 'backdoored' model. The output is a model that can be controlled to misclassify specific images based on text commands, while still performing accurately on normal inputs.
No commits in the last 6 months.
Use this if you are researching advanced backdoor attacks on image classification models and need a tool to create and evaluate language-guided backdoor vulnerabilities.
Not ideal if you are looking for a defensive tool to detect or mitigate existing backdoors, or if your focus is on NLP model vulnerabilities.
Stars
44
Forks
4
Language
Python
License
MIT
Category
Last pushed
Feb 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/HKU-TASR/Imperio"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TrustAI-laboratory/LMAP
LMAP (large language model mapper) is like NMAP for LLM, is an LLM Vulnerability Scanner and...
qroa/QROA
QROA: A Black-Box Query-Response Optimization Attack on LLMs
leondz/lm_risk_cards
Risks and targets for assessing LLMs & LLM vulnerabilities
zealscott/AutoProfiler
Source code for Automated Profile Inference with Language Model Agents
shreyansh26/Red-Teaming-Language-Models-with-Language-Models
A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022