Llm Evaluation Benchmarking Transformer Models

There are 6 llm evaluation benchmarking models tracked. The highest-rated is allenai/RL4LMs at 38/100 with 2,382 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-evaluation-benchmarking&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	allenai/RL4LMs A modular RL library to fine-tune language models to human preferences	38	Emerging	2,382	Python
2	cloudguruab/modsysML Human reinforcement learning (RLHF) framework for AI models. Evaluate and...	34	Emerging	36	Python
3	modal-labs/stopwatch A tool for benchmarking LLMs on Modal	29	Experimental	50	Python
4	Mya-Mya/CBF-LLM "CBF-LLM: Safe Control for LLM Alignment"	28	Experimental	12	Python
5	Adora-Foundation/llm-energy-lab Web application for benchmarking and comparing LLM behaviour, energy and...	19	Experimental	11	Python
6	mrconter1/PullRequestBenchmark Evaluating LLMs performance in PR reviews as an indicator for their...	14	Experimental	13	Python