AI-secure/AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
Introduces gradient-guided trigger optimization targeting RAG embedders (BERT, DPR, ANCE, BGE, REALM, ORQA) with coherence filtering and configurable poisoning strategies across multiple agent architectures (autonomous driving, QA, EHR systems). Demonstrates backdoor attacks on agent memory and knowledge retrieval by crafting adversarial passage tokens that manipulate embedding similarity scores while maintaining semantic coherence through perplexity filtering.
203 stars. No commits in the last 6 months.
Stars
203
Forks
27
Language
Python
License
MIT
Category
Last pushed
Apr 12, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/AI-secure/AgentPoison"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
LLAMATOR-Core/llamator
Red Teaming python-framework for testing chatbots and GenAI systems.
sleeepeer/PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented...
JuliusHenke/autopentest
CLI enabling more autonomous black-box penetration tests using Large Language Models (LLMs)
kelkalot/simpleaudit
Allows to red-team your AI systems through adversarial probing. It is simple, effective, and...
SecurityClaw/SecurityClaw
A modular, skill-based autonomous Security Operations Center (SOC) agent that monitors...