sleeepeer/PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
Implements both black-box (LM-targeted) and white-box (HotFlip) poisoning attacks on retrieval corpora, targeting popular RAG retriever-LLM pairs including Contriever with GPT-3.5/4, PaLM 2, and LLaMA. Evaluates attacks across BEIR benchmark datasets (NQ, HotpotQA, MS-MARCO) with configurable hyperparameters for adversarial document generation and ranking manipulation. Integrates with Hugging Face model APIs and supports local model deployment via FastChat for reproducible adversarial evaluation.
242 stars.
Stars
242
Forks
38
Language
Python
License
MIT
Category
Last pushed
Jan 27, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/sleeepeer/PoisonedRAG"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
LLAMATOR-Core/llamator
Red Teaming python-framework for testing chatbots and GenAI systems.
JuliusHenke/autopentest
CLI enabling more autonomous black-box penetration tests using Large Language Models (LLMs)
kelkalot/simpleaudit
Allows to red-team your AI systems through adversarial probing. It is simple, effective, and...
SecurityClaw/SecurityClaw
A modular, skill-based autonomous Security Operations Center (SOC) agent that monitors...
AI-secure/AgentPoison
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or...