AI-secure/AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

/ 100

Emerging

Introduces gradient-guided trigger optimization targeting RAG embedders (BERT, DPR, ANCE, BGE, REALM, ORQA) with coherence filtering and configurable poisoning strategies across multiple agent architectures (autonomous driving, QA, EHR systems). Demonstrates backdoor attacks on agent memory and knowledge retrieval by crafting adversarial passage tokens that manipulate embedding similarity scores while maintaining semantic coherence through perplexity filtering.

203 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

203

Forks

Language

Python

License

MIT

Higher-rated alternatives

LLAMATOR-Core/llamator

Red Teaming python-framework for testing chatbots and GenAI systems.

sleeepeer/PoisonedRAG

[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented...

JuliusHenke/autopentest

CLI enabling more autonomous black-box penetration tests using Large Language Models (LLMs)

kelkalot/simpleaudit

Allows to red-team your AI systems through adversarial probing. It is simple, effective, and...

SecurityClaw/SecurityClaw

A modular, skill-based autonomous Security Operations Center (SOC) agent that monitors...

Explore RAG Tools

All categories Trending RAG directory Insights