Llm Evaluation Benchmarking Transformer Models

There are 6 llm evaluation benchmarking models tracked. The highest-rated is allenai/RL4LMs at 38/100 with 2,382 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-evaluation-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 allenai/RL4LMs

A modular RL library to fine-tune language models to human preferences

38
Emerging
2 cloudguruab/modsysML

Human reinforcement learning (RLHF) framework for AI models. Evaluate and...

34
Emerging
3 modal-labs/stopwatch

A tool for benchmarking LLMs on Modal

29
Experimental
4 Mya-Mya/CBF-LLM

"CBF-LLM: Safe Control for LLM Alignment"

28
Experimental
5 Adora-Foundation/llm-energy-lab

Web application for benchmarking and comparing LLM behaviour, energy and...

19
Experimental
6 mrconter1/PullRequestBenchmark

Evaluating LLMs performance in PR reviews as an indicator for their...

14
Experimental