AI Debate Arenas LLM Tools

Interactive platforms where AI models or humans compete in structured debates with scoring/judging. Includes multi-model comparison tools, live debate staging, and adversarial benchmarking. Does NOT include general competitive game frameworks, coding competitions, or security red-team tools without debate mechanics.

There are 42 ai debate arenas tools tracked. The highest-rated is betagouv/ComparIA at 48/100 with 63 stars.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=ai-debate-arenas&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	betagouv/ComparIA Open source LLM arena created by the French Government	48	Emerging	63	Jupyter Notebook
2	liuxiaotong/ai-dataset-radar Multi-source async competitive intelligence engine for AI training data...	46	Emerging	2	Python
3	Skytliang/Multi-Agents-Debate MAD: The first work to explore Multi-Agent Debate with Large Language Models :D	42	Emerging	532	Python
4	llm-ring/lmring Open-source, self-hostable LLM arena with model compare, voting, and leaderboards	40	Emerging	8	TypeScript
5	Arnoldlarry15/ARES-Dashboard AI Red Team Operations Console	40	Emerging	14	TypeScript
6	khoren93/ai-debates Orchestrate epic battles between 600+ AI models (GPT-5, Gemini 3, DeepSeek...	31	Emerging	11	TypeScript
7	OpenLLM-Council/dev-council An experimental framework for building collaborative coding agents that...	29	Experimental	3	Python
8	YerbaPage/SWE-Debate SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution	29	Experimental	25	Python
9	rd-serendipity/ai-debate-arena AI Debate Arena: Streamlit app for AI model debates. Features multi-model...	26	Experimental	7	Python
10	jhammant/agent-drift Stress-test AI agents for goal drift and system prompt violations. Inspired...	26	Experimental	5	HTML
11	debate/debate-ai.com Debate AI enables debaters in PF, LD & Policy to streamline AI research...	26	Experimental	7	TypeScript
12	jeremysball/go-pair generate adversarial prompts for LLMs	25	Experimental	1	Go
13	aramirez-maza/beto-framework BETO formalizes the ignorance of an AI. Materializes raw semantic intent...	24	Experimental	2	Python
14	13120740298z-lang/AI-Tech-Radar 🤖 Automated AI technology intelligence platform — tracks GitHub AI projects,...	23	Experimental	1	JavaScript
15	sanifhimani/llm-colosseum AI models fight each other in a pixel arena every day. They decide what to...	23	Experimental	1	JavaScript
16	lukeslp/consensus Multi-model research debate engine — 8+ language models independently...	23	Experimental	1	HTML
17	qingni/TechSentry 🛡️ AI-powered tech intelligence tool that auto-tracks GitHub repos, Hacker...	22	Experimental	—	Python
18	dinesh-git17/debate-lab Watch AI models debate any topic in real-time. ChatGPT and Grok argue,...	22	Experimental	3	TypeScript
19	homeofe/ai-security-arena AI Security Arena: Interactive web interface for AI-powered Red Team vs Blue...	22	Experimental	—	TypeScript
20	Aimaghsoodi/FailSafe AI Failure Intelligence — predict, detect, and recover from AI agent failures	22	Experimental	—	HTML
21	Firmislabs/ai-inventory Detect AI frameworks, LLM dependencies, and model files in any project. One...	22	Experimental	—	TypeScript
22	the-meta-value/The-Perfect-Storm tracking AI system failures with solar weather	20	Experimental	1	TeX
23	florykhan/TelusGuardAI AI-powered network impact analyzer. Natural-language queries → multi-agent...	19	Experimental	—	JavaScript
24	EdouardZemb/test-framework CLI for TRA QA process optimization: Jira/Squash triage, testability...	19	Experimental	—	Rust
25	lechmazur/debate Adversarial multi-turn benchmark for LLM debate quality, using side-swapped...	18	Experimental	8	—
26	netlify/nextjs-sentinel Monitors Next.js releases for relevance to Netlify	17	Experimental	4	TypeScript
27	SurgeCLI/Surge Surge is a lightweight self-learning AI observability and remediation agent...	17	Experimental	2	Python
28	ethicals7s/EchoArena Local LLM debate arena — make two models battle any topic offline, third...	15	Experimental	—	Go
29	bassrehab/artemis-agents A production-ready framework for structured multi-agent debates with...	15	Experimental	—	HTML
30	ayushdwivedi001/AI-Radar-Pulse An autonomous high-availability framework for tracking, ranking, and...	15	Experimental	1	—
31	armsp/AIFU AI flub ups	14	Experimental	1	JavaScript
32	Gliangquan/awesome-ai-radar Daily curated AI, LLM, and agent project radar from GitHub	14	Experimental	—	JavaScript
33	samuel-dobrancin-qa/ai-content-quality-framework Structured framework for evaluating AI content generator quality across six...	14	Experimental	—	—
34	Rak2k6/gig-audit Fair Gig Guardian is an AI-powered platform that analyzes gig economy...	14	Experimental	—	TypeScript
35	gregorydouglasquarles/lavaflow-site AI-driven predictive safety ecosystem (qView) and multimedia architecture....	14	Experimental	—	HTML
36	VAMP-NEER/release-radar-ai 🔍 Ultimate Free GitHub Trend Tracker 2026 🚀 \| AI-Powered Repo & Dev Team...	14	Experimental	—	—
37	Swap-24/ARGUS Real-time AI debate arena. Argue live against opponents while an ML pipeline...	14	Experimental	—	JavaScript
38	konradhy/battlearena A proof of concept illustrating how AI can enhance games	14	Experimental	1	TypeScript
39	WaterGorilla/agent-arena Compete with AI agents in a UFC-style coding battle, solving tasks live...	11	Experimental	—	—
40	Gouthambalaji03/AI-Debate-Arena AI Debate Arena is an interactive web experience that stages rapid-fire...	11	Experimental	—	TypeScript
41	Privacy-Engineering-CMU/ai-risk-prettified A prettified page for MIT's AI Risk Database	11	Experimental	—	HTML
42	andronov04/aiarena Client-side AI arena for comparing 1000+ models across 68+ providers. No...	10	Experimental	4	TypeScript