Llm Comparison Evaluation Transformer Models

There are 6 llm comparison evaluation models tracked. The highest-rated is UBC-MDS/fixml at 33/100 with 4 stars.

Get all 6 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-comparison-evaluation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	UBC-MDS/fixml LLM Tool for effective test evaluation of ML projects with curated...	33	Emerging	4	Python
2	AstraBert/DebateLLM-Championship 5 LLMs, 1vs1 matches to produce the most convincing argumentation in favor...	24	Experimental	4	Jupyter Notebook
3	JosephTLucas/llm_test A suite of tests to verify bias, safety, trust, and security concerns for LLMs.	20	Experimental	7	Python
4	iSEngLab/LLM4UT_Empirical [ISSTA 2025] A Large-scale Empirical Study on Fine-tuning Large Language...	13	Experimental	13	Python
5	iSEngLab/RetriGen [2025 TOSEM] Improving Deep Assertion Generation via Fine-Tuning...	12	Experimental	6	Python
6	iSEngLab/LLM4AG [2025 TOSEM] Exploring Automated Assertion Generation via Large Language Models	12	Experimental	8	Python