wangywUST/OutputJailbreak

Repository for our paper "Frustratingly Easy Jailbreak of Large Language Models via Output Prefix Attacks". https://www.researchsquare.com/article/rs-4385503/latest

/ 100

Experimental

This project offers methods to test the security vulnerabilities of large language models (LLMs). It takes a malicious request, like generating harmful content, and applies simple techniques to bypass the model's safety filters, producing the harmful output. This is useful for AI security researchers, red teamers, and developers responsible for evaluating and hardening LLMs against misuse.

No commits in the last 6 months.

Use this if you need to quickly and easily assess how susceptible a black-box large language model is to generating unsafe or malicious content.

Not ideal if you are looking for methods to improve the safety mechanisms of an LLM or want to prevent jailbreaks rather than perform them.

AI security LLM red-teaming vulnerability testing AI safety evaluation model robustness

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Jupyter Notebook

License

—

Higher-rated alternatives

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning LLMs into a "Jailbroken State" Without "a Jailbreak Attack".

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods...

yiksiu-chan/SpeakEasy

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

xirui-li/DrAttack

Official implementation of paper: DrAttack: Prompt Decomposition and Reconstruction Makes...

tmlr-group/DeepInception

[arXiv:2311.03191] "DeepInception: Hypnotize Large Language Model to Be Jailbreaker"

Explore LLM Tools

All categories Trending LLM Tool directory Insights