Llm Comparison Evaluation Transformer Models
There are 6 llm comparison evaluation models tracked. The highest-rated is UBC-MDS/fixml at 33/100 with 4 stars.
Get all 6 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-comparison-evaluation&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Model | Score | Tier |
|---|---|---|---|
| 1 |
UBC-MDS/fixml
LLM Tool for effective test evaluation of ML projects with curated... |
|
Emerging |
| 2 |
AstraBert/DebateLLM-Championship
5 LLMs, 1vs1 matches to produce the most convincing argumentation in favor... |
|
Experimental |
| 3 |
JosephTLucas/llm_test
A suite of tests to verify bias, safety, trust, and security concerns for LLMs. |
|
Experimental |
| 4 |
iSEngLab/LLM4UT_Empirical
[ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language... |
|
Experimental |
| 5 |
iSEngLab/RetriGen
[2025 TOSEM] Improving Deep Assertion Generation via Fine-Tuning... |
|
Experimental |
| 6 |
iSEngLab/LLM4AG
[2025 TOSEM] Exploring Automated Assertion Generation via Large Language Models |
|
Experimental |