qroa/QROA

QROA: A Black-Box Query-Response Optimization Attack on LLMs

/ 100

Emerging

QROA helps security researchers and AI safety engineers evaluate the robustness of Large Language Models (LLMs). It takes a malicious instruction and, through a series of interactions, identifies a 'trigger' that, when added to the instruction, makes the LLM produce harmful content. The output includes these optimized triggers and logs detailing the attack process and success.

No commits in the last 6 months.

Use this if you need to test how easily an LLM can be manipulated into generating harmful or unintended content without needing internal model access.

Not ideal if you are looking for a defensive tool to prevent LLMs from generating harmful content, as this tool is designed for offensive testing.

AI Safety Red Teaming LLM Vulnerability Testing Content Moderation Testing Adversarial AI

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

TrustAI-laboratory/LMAP

LMAP (large language model mapper) is like NMAP for LLM, is an LLM Vulnerability Scanner and...

HKU-TASR/Imperio

[IJCAI 2024] Imperio is an LLM-powered backdoor attack. It allows the adversary to issue...

leondz/lm_risk_cards

Risks and targets for assessing LLMs & LLM vulnerabilities

zealscott/AutoProfiler

Source code for Automated Profile Inference with Language Model Agents

shreyansh26/Red-Teaming-Language-Models-with-Language-Models

A re-implementation of the "Red Teaming Language Models with Language Models" paper by Perez et al., 2022

Explore Transformer Models

All categories Trending Transformer directory Insights