GoodAI/goodai-ltm-benchmark

A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you need to evaluate your own agents. See more in the blogpost:

/ 100

Emerging

No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 9 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

HTML

License

—

Category

domain-specific-benchmarks

Last pushed

Dec 17, 2024

Commits (30d)

GitHub

Domain-Specific Benchmarks · 141 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/GoodAI/goodai-ltm-benchmark"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

xlang-ai/OSWorld

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

bigcode-project/bigcodebench

[ICLR'25] BigCodeBench: Benchmarking Code Generation Towards AGI

sierra-research/tau2-bench

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

swefficiency/swefficiency

Benchmark harness and code for "SWE-fficiency: Can Language Models Optimize Real World...

Explore LLM Tools

All categories Trending LLM Tool directory Insights