mich1803/Codenames-LLM

Building an AI team to play Codenames using top Large Language Models (LLMs), evaluating performance, and pitting them against each other. Explore their strategy and capabilities in this interactive competition!

/ 100

Experimental

No commits in the last 6 months.

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 2 / 25

Maturity 1 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

—

Category

llm-comparison-evaluation

Last pushed

Dec 27, 2024

Commits (30d)

GitHub

LLM Comparison Evaluation · 96 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mich1803/Codenames-LLM"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

open-compass/opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral,...

IBM/unitxt

🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the...

lean-dojo/LeanDojo

Tool for data extraction and interacting with Lean programmatically.

GoodStartLabs/AI_Diplomacy

Frontier Models playing the board game Diplomacy.

MigoXLab/LMeterX

A general-purpose API load testing platform that supports LLM services and business HTTP...

Explore LLM Tools

All categories Trending LLM Tool directory Insights