wangywUST/OutputJailbreak
Repository for our paper "Frustratingly Easy Jailbreak of Large Language Models via Output Prefix Attacks". https://www.researchsquare.com/article/rs-4385503/latest
This project offers methods to test the security vulnerabilities of large language models (LLMs). It takes a malicious request, like generating harmful content, and applies simple techniques to bypass the model's safety filters, producing the harmful output. This is useful for AI security researchers, red teamers, and developers responsible for evaluating and hardening LLMs against misuse.
No commits in the last 6 months.
Use this if you need to quickly and easily assess how susceptible a black-box large language model is to generating unsafe or malicious content.
Not ideal if you are looking for methods to improve the safety mechanisms of an LLM or want to prevent jailbreaks rather than perform them.
Stars
9
Forks
—
Language
Jupyter Notebook
License
—
Category
Last pushed
Jun 19, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/wangywUST/OutputJailbreak"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
wuyoscar/ISC-Bench
Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".
yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...
yiksiu-chan/SpeakEasy
[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
xirui-li/DrAttack
Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...
tmlr-group/DeepInception
[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"