dreadnode/AIRTBench-Code

Code Repository for: AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models

53
/ 100
Established

Implements an autonomous red teaming agent using a modular harness architecture that executes Python code within isolated Docker containers to solve AI/ML CTF challenges. The agent integrates with the Dreadnode Strikes platform and uses the Rigging framework, receiving challenge notebooks and iteratively attempting exploits through a Jupyter kernel feedback loop with configurable step limits and timeout controls. Supports filtering for LLM-based challenges and provides standardized evaluation metrics for measuring adversarial capabilities across different language models.

No Package No Dependents
Maintenance 13 / 25
Adoption 9 / 25
Maturity 15 / 25
Community 16 / 25

How are scores calculated?

Stars

93

Forks

14

Language

Jupyter Notebook

License

Apache-2.0

Last pushed

Mar 11, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/agents/dreadnode/AIRTBench-Code"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.