Llm Comparison Evaluation Transformer Models

There are 6 llm comparison evaluation models tracked. The highest-rated is UBC-MDS/fixml at 33/100 with 4 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-comparison-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 UBC-MDS/fixml

LLM Tool for effective test evaluation of ML projects with curated...

33
Emerging
2 AstraBert/DebateLLM-Championship

5 LLMs, 1vs1 matches to produce the most convincing argumentation in favor...

24
Experimental
3 JosephTLucas/llm_test

A suite of tests to verify bias, safety, trust, and security concerns for LLMs.

20
Experimental
4 iSEngLab/LLM4UT_Empirical

[ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language...

13
Experimental
5 iSEngLab/RetriGen

[2025 TOSEM] Improving Deep Assertion Generation via Fine-Tuning...

12
Experimental
6 iSEngLab/LLM4AG

[2025 TOSEM] Exploring Automated Assertion Generation via Large Language Models

12
Experimental