LLMeBench and LLF-Bench

Both projects are direct competitors, offering distinct benchmarks for evaluating large language models, with LLMeBench focusing on general LLM performance and LLF-Bench specializing in learning agents guided by language feedback.

LLMeBench
50
Established
LLF-Bench
45
Emerging
Maintenance 2/25
Adoption 12/25
Maturity 17/25
Community 19/25
Maintenance 2/25
Adoption 9/25
Maturity 16/25
Community 18/25
Stars: 105
Forks: 21
Downloads: 17
Commits (30d): 0
Language: Python
License:
Stars: 95
Forks: 18
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No License Stale 6m
Stale 6m No Package No Dependents

About LLMeBench

qcri/LLMeBench

Benchmarking Large Language Models

About LLF-Bench

microsoft/LLF-Bench

A benchmark for evaluating learning agents based on just language feedback

Scores updated daily from GitHub, PyPI, and npm data. How scores work