agencyenterprise/PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

55
/ 100
Established

Implements mask-based iterative adversarial composition to systematically evaluate two attack vectors: goal hijacking (redirecting model behavior to execute malicious instructions) and prompt leaking (extracting hidden system prompts). The framework enables researchers to quantitatively measure LLM vulnerability by testing handcrafted adversarial inputs against production models like GPT-3, revealing how simple user interactions can exploit stochastic model behavior.

465 stars and 88 monthly downloads. No commits in the last 6 months. Available on PyPI.

Stale 6m
Maintenance 0 / 25
Adoption 14 / 25
Maturity 25 / 25
Community 16 / 25

How are scores calculated?

Stars

465

Forks

44

Language

Python

License

MIT

Last pushed

Feb 26, 2024

Monthly downloads

88

Commits (30d)

0

Dependencies

4

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/prompt-engineering/agencyenterprise/PromptInject"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.