Model Evaluation Diagnostics Transformer Models

Tools for systematically evaluating, diagnosing, and benchmarking transformer models across NLI, WSD, and other NLP tasks using standard test sets and evaluation frameworks. Does NOT include general model training, fine-tuning without evaluation focus, or language-specific model overviews.

There are 48 model evaluation diagnostics models tracked. 1 score above 50 (established tier). The highest-rated is minggnim/nlp-models at 50/100 with 2 stars and 201 monthly downloads.

Get all 48 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=model-evaluation-diagnostics&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	minggnim/nlp-models A repository for training transformer based models	50	Established	2	Jupyter Notebook
2	IntelLabs/nlp-architect A model library for exploring state-of-the-art deep learning topologies and...	49	Emerging	2,935	Python
3	yuanzhoulvpi2017/zero_nlp 中文nlp解决方案(大模型、数据、模型、训练、推理)	49	Emerging	3,783	Jupyter Notebook
4	LoicGrobol/zeldarose Train transformer-based models.	48	Emerging	28	Python
5	CPJKU/wechsel Code for WECHSEL: Effective initialization of subword embeddings for...	42	Emerging	89	Python
6	soldni/pyterrier_sentence_transformers Create PyTerrier compatible dense indices using any sentence_transformers model	39	Emerging	6	Python
7	MahmoudWahdan/dialog-nlu Tensorflow and Keras implementation of the state of the art researches in...	39	Emerging	100	Jupyter Notebook
8	yuanzhoulvpi2017/quick_sentence_transformers sentence-transformers to onnx 让sbert模型推理效率更快	39	Emerging	166	Python
9	ukairia777/tensorflow-nlp-tutorial tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림...	38	Emerging	575	Jupyter Notebook
10	HarderThenHarder/transformers_tasks ⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification,...	34	Emerging	2,412	Jupyter Notebook
11	g8a9/ferret A python package for benchmarking interpretability techniques on Transformers.	32	Emerging	215	Python
12	sinanuozdemir/oreilly-bert-nlp This repository contains code for the O'Reilly Live Online Training for BERT	29	Experimental	32	Jupyter Notebook
13	Azure/nlp-samples Japanese NLP sample codes	29	Experimental	10	Shell
14	ManashJKonwar/NLP-Transformers Transformer (BERT, GPT2, etc.) based Training Module for popular NLP tasks	27	Experimental	9	Python
15	polakowo/textai Applications using state-of-the-art in NLP	27	Experimental	6	Jupyter Notebook
16	shunk031/allennlp-shiba-model AllenNLP integration for Shiba: Japanese CANINE model	25	Experimental	12	Python
17	rajaswa/indic-syntax-evaluation Vyākarana: A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages	25	Experimental	15	Jupyter Notebook
18	ropensci/pangoling An R package for estimating the log-probabilities of words in a given...	24	Experimental	12	R
19	VirtualRoyalty/gan-plus-nlp Generative adversarial approach to most popular NLP tasks	24	Experimental	4	Jupyter Notebook
20	prajjwal1/generalize_lm_nli Code for the paper EMNLP 2021 workshop paper "Generalization in NLI: Ways...	24	Experimental	34	Jupyter Notebook
21	stevezheng23/fewshot_nlp_pt Few-shot NLP in PyTorch	24	Experimental	4	Python
22	Nickil21/weakly-supervised-parsing Official Code for our Findings of ACL 2022 paper: Co-training an...	24	Experimental	4	Python
23	matteomedioli/BERT-KG Enriching Language Models Representations via Knowledge Graphs Regularisation	24	Experimental	3	Python
24	th789/mbr-for-nmt Characterizing the performance of minimum Bayes risk (MBR) decoding for...	23	Experimental	2	Jupyter Notebook
25	CyberAgentAILab/japanese-nli-model This repository provides the code for Japanese NLI model, a fine-tuned...	23	Experimental	6	Jupyter Notebook
26	proycon/deepfrog An NLP-suite powered by deep learning	22	Experimental	19	Rust
27	ai-forever/model-zoo NLP model zoo for Russian	22	Experimental	50	—
28	Beomi/transformers-language-modeling Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3	21	Experimental	23	Python
29	yucc2018/share 一些代码实践分享。	21	Experimental	22	Jupyter Notebook
30	TRISTAN-ORF/RiboTIE Scripts and instructions to apply RiboTIE on Ribo-seq data	20	Experimental	19	—
31	ishan00/meta-learning-for-multi-task-multilingual Official Repository for the paper titled "Meta-Learning for Effective...	19	Experimental	9	Python
32	DFKI-NLP/gevalm Code and data for the paper "Evaluating German Transformer Language Models...	18	Experimental	7	Python
33	hppRC/simple-simcse-ja Exploring Japanese SimCSE	17	Experimental	69	Python
34	zhestyatsky/MCL-WiC Research on Multilingual and Cross-lingual Word-in-Context Disambiguation	16	Experimental	4	Jupyter Notebook
35	SapienzaNLP/xl-wsd-code Code to train and test Word Sense Disambiguation models based on different...	15	Experimental	15	Python
36	princeton-nlp/MultilingualAnalysis Repository for the paper titled: "When is BERT Multilingual? Isolating...	14	Experimental	13	Python
37	aarnetalman/nli-with-transformers Fine-tune transformers with NLI data	13	Experimental	—	Python
38	brihijoshi/granular-similarity-COLING-2020 Code for the paper "The Devil is in the Details: Evaluating Limitations of...	13	Experimental	8	Jupyter Notebook
39	RobinSmits/Dutch-NLP-Experiments This repository contains a number of experiments with Multi Lingual...	13	Experimental	5	Python
40	iamlxb3/UMAMGT Code for the publication of LREC'22	12	Experimental	3	Jupyter Notebook
41	HannaAbiAkl/PSYCHIC The official repository for the PSYCHIC model	12	Experimental	3	Jupyter Notebook
42	TRISTAN-ORF/RiboTIE_article Scripts run to produce the RiboTIE paper	12	Experimental	3	Shell
43	skomban/seq-unscrambler Unscrambles shuffled letters in a word sequence.	11	Experimental	2	Python
44	DudalaShrujana/nlp-transformers-toolkit ModularNLP pipeline utilizing Hugging Face Transformers for Sentiment...	11	Experimental	—	Python
45	mhdr3a/transformers-diagnostics Model Evaluation using SuperGLUE Diagnostic Dataset	11	Experimental	—	Python
46	bglid/haitian-creole-nlu Project designed to reimplement and build upon CreoleVal's Reading...	10	Experimental	1	Python
47	loubnabnl/canine-mednli CANINE for Medical Natural Language Inference on MedNLI data, as part of the...	10	Experimental	1	Python
48	SambhawDrag/XLNet.jl A Julia-based implementation of XLNet: A Generalized Autoregressive...	10	Experimental	1	Julia