Transformer Training Optimization Transformer Models

Tools, frameworks, and techniques for accelerating transformer model training and inference through hardware-specific optimizations, parallelism strategies, and performance tuning. Does NOT include model compression/pruning, application-specific fine-tuning, or inference deployment platforms.

There are 42 transformer training optimization models tracked. 3 score above 70 (verified tier). The highest-rated is huggingface/optimum at 90/100 with 3,325 stars and 1,613,657 monthly downloads. 4 of the top 10 are actively maintained.

Get all 42 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-training-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	huggingface/optimum 🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and...	90	Verified	3,325	Python
2	openvinotoolkit/nncf Neural Network Compression Framework for enhanced OpenVINO™ inference	86	Verified	1,136	Python
3	NVIDIA/Megatron-LM Ongoing research training transformer models at scale	76	Verified	15,633	Python
4	huggingface/optimum-intel 🤗 Optimum Intel: Accelerate inference with Intel optimization tools	64	Established	548	Jupyter Notebook
5	RBLN-SW/optimum-rbln ⚡ A seamless integration of HuggingFace Transformers & Diffusers with RBLN...	61	Established	15	Python
6	eole-nlp/eole Open language modeling toolkit based on PyTorch	58	Established	176	Python
7	huggingface/optimum-habana Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)	57	Established	207	Python
8	microsoft/mup maximal update parametrization (µP)	56	Established	1,689	Jupyter Notebook
9	olivkoch/nano-trm An implementation of Tiny Recursive Models (TRM)	44	Emerging	101	Python
10	NVIDIA-AI-IOT/nanoowl A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.	41	Emerging	409	Python
11	AlekseyKorshuk/optimum-transformers Accelerated NLP pipelines for fast inference on CPU and GPU. Built with...	41	Emerging	126	Python
12	patil-suraj/onnx_transformers Accelerated NLP pipelines for fast inference on CPU. Built with Transformers...	39	Emerging	127	Jupyter Notebook
13	huggingface/optimum-graphcore Blazing fast training of 🤗 Transformers on Graphcore IPUs	39	Emerging	87	Python
14	LowinLi/fastgpt ⚡ boost inference speed of GPT models in transformers by onnxruntime	39	Emerging	52	Python
15	xrsrke/pipegoose Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of...	37	Emerging	87	Python
16	Jagatmohan46/tiny-recursive-model 🚀 Implement the Tiny Recursive Model (TRM) for improved performance in...	35	Emerging	1	Python
17	ParCIS/Chimera Chimera: bidirectional pipeline parallelism for efficiently training...	31	Emerging	70	Python
18	teelinsan/parallel-decoding Repository of the paper "Accelerating Transformer Inference for Translation...	28	Experimental	124	Python
19	Naman-ntc/FastCode Utilities for efficient fine-tuning, inference and evaluation of code...	26	Experimental	21	Python
20	rasbt/faster-pytorch-blog Outlining techniques for improving the training performance of your PyTorch...	25	Experimental	128	Python
21	alex-snd/TRecover 📜 A python library for distributed training of a Transformer neural network...	25	Experimental	6	Python
22	jshuadvd/LongRoPE Implementation of the LongRoPE: Extending LLM Context Window Beyond 2...	24	Experimental	151	Python
23	sandyresearch/chipmunk 🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E...	24	Experimental	101	Cuda
24	XingLuxi/Cal-FLOPs-for-PLM Calculating FLOPs of Pre-trained Models in NLP	22	Experimental	18	Python
25	14062/Megatron-LM Enable large-scale transformer model training with GPU-optimized tools and...	22	Experimental	—	Python
26	dkurt/optimum-openvino Intel OpenVINO extension for Hugging Face Transformers	21	Experimental	8	Python
27	NachoPeinador/FRUGAL_AI_CHIP FrugalAI Chip: Modular silicon architecture for disposable AI. Achieves...	20	Experimental	1	Jupyter Notebook
28	dzungphieuluuky/OuroTrace Benchmark and evaluation ByteDance Ouro model based on Looped Language...	20	Experimental	1	Python
29	dino65-dev/REPO-Attention RePo: Language Models with Context Re-Positioning by Sakana AI	19	Experimental	—	Python
30	KimDaeUng/PLM-Implementation NLP Pretrained Language Models Implementation Study	18	Experimental	5	Jupyter Notebook
31	korovod/kenotron Experimental fork of Nanotron, a minimalistic large language model...	17	Experimental	2	Python
32	kyegomez/VO-ROPE An implementation of the all-new rope from jianlin	14	Experimental	4	Python
33	christinakim/scaling-laws-for-language-transfer code for Scaling Laws for Language Transfer Learning	14	Experimental	9	Python
34	stoyan-stoyanov/transformers-calculator Transformer Calculator - Estimate training time for transformer models.	13	Experimental	8	JavaScript
35	luozichen/NeonBench A systematic study of ultra-tiny language models	12	Experimental	1	Python
36	supersjgk/Transformers Playing with Transformers and LLM	12	Experimental	3	Jupyter Notebook
37	mtszkw/fast-torch Comparing PyTorch, JIT and ONNX for inference with Transformers	12	Experimental	20	Python
38	Adithya1209/slm-architecture-benchmarks Comparative study of Linear, MLP, Attention, and Transformer architectures...	11	Experimental	—	Python
39	sakhileln/rope-pytorch RoPE Playground – Rotary Positional Embeddings in PyTorch	11	Experimental	—	Python
40	elvinagam/benchmarking_gpu_inference Scripts from Neural network inference on Pytorch with tools like ONNX,...	10	Experimental	1	Jupyter Notebook
41	MarkusSagen/Transformers-LM-Benchmark Benchmark training and inference time for Transformer models on Huggingface	10	Experimental	1	Python
42	rpatrik96/llm-non-identifiability Investigating the non-identifiability of Transformers	10	Experimental	1	Python

Comparisons in this category

optimum and optimum-intel (90 vs 64) optimum and optimum-habana (90 vs 57) optimum and optimum-transformers (90 vs 41) optimum and optimum-graphcore (90 vs 39) optimum and optimum-rbln (90 vs 61) optimum-habana and optimum-graphcore (57 vs 39) nano-trm and tiny-recursive-model (44 vs 35)