llamator and redteam-ai-benchmark
These are complementary tools: LLAMATOR-Core provides a framework for executing red team tests against chatbots and GenAI systems, while redteam-ai-benchmark supplies a structured evaluation methodology and benchmark dataset for assessing LLM vulnerabilities in offensive security contexts.
About llamator
LLAMATOR-Core/llamator
Red Teaming python-framework for testing chatbots and GenAI systems.
Provides modular attack vectors targeting prompt injection, jailbreaks, system prompt leakage, and resource exhaustion across LLMs, RAGs, and vision models. Supports multiple client integrations including LangChain, OpenAI-compatible APIs, and web interfaces (Selenium, Telethon), with extensible custom attack definitions. Generates detailed audit trails in Excel/CSV formats and DOCX test reports mapped to OWASP LLM vulnerability classifications.
About redteam-ai-benchmark
toxy4ny/redteam-ai-benchmark
Red Team AI Benchmark: Evaluating Uncensored LLMs for Offensive Security
Scores updated daily from GitHub, PyPI, and npm data. How scores work