HKU-TASR/Imperio

[IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue language-guided instructions to control the victim model's prediction for arbitrary targets.

/ 100

Emerging

This project helps security researchers and AI auditors understand a new type of vulnerability in machine learning models, specifically in image classification. It takes a clean image dataset and, using language-guided instructions, trains a 'backdoored' model. The output is a model that can be controlled to misclassify specific images based on text commands, while still performing accurately on normal inputs.

No commits in the last 6 months.

Use this if you are researching advanced backdoor attacks on image classification models and need a tool to create and evaluate language-guided backdoor vulnerabilities.

Not ideal if you are looking for a defensive tool to detect or mitigate existing backdoors, or if your focus is on NLP model vulnerabilities.

AI-security ML-auditing threat-modeling image-classification model-vulnerability

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

TrustAI-laboratory/LMAP

LMAP (large language model mapper) is like NMAP for LLM, is an LLM Vulnerability Scanner and...

qroa/QROA

QROA: A Black-Box Query-Response Optimization Attack on LLMs

leondz/lm_risk_cards

Risks and targets for assessing LLMs & LLM vulnerabilities

zealscott/AutoProfiler

Source code for Automated Profile Inference with Language Model Agents

shreyansh26/Red-Teaming-Language-Models-with-Language-Models

A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022

Explore Transformer Models

All categories Trending Transformer directory Insights