LLMeBench and LLF-Bench

Both projects are direct competitors, offering distinct benchmarks for evaluating large language models, with LLMeBench focusing on general LLM performance and LLF-Bench specializing in learning agents guided by language feedback.

LLMeBench

Established

LLF-Bench

Emerging

Maintenance 2/25

Adoption 12/25

Maturity 17/25

Community 19/25

Maintenance 2/25

Adoption 9/25

Maturity 16/25

Community 18/25

Stars: 105

Forks: 21

Downloads: 17

Commits (30d): 0

Language: Python

License: —

Stars: 95

Forks: 18

Downloads: —

Commits (30d): 0

Language: Python

License: MIT

No License Stale 6m

Stale 6m No Package No Dependents

About LLMeBench

qcri/LLMeBench

Benchmarking Large Language Models

About LLF-Bench

microsoft/LLF-Bench

A benchmark for evaluating learning agents based on just language feedback

Related comparisons

LLMeBench and llm-optimizer-benchmark

Scores updated daily from GitHub, PyPI, and npm data. How scores work