All Transformer Models
6,429 models ranked by quality score · Page 13 of 65
| # | Model | Score | Tier |
|---|---|---|---|
| 1201 |
iverly/llamafile-docker
Distribute and run llamafile/LLMs with a single docker image. |
|
Emerging |
| 1202 |
xjywhu/Awesome-Multimodal-LLM-for-Code
Multimodal Large Language Models for Code Generation under Multimodal Scenarios |
|
Emerging |
| 1203 |
ariya/chat-llm
Chat with an LLM |
|
Emerging |
| 1204 |
jianzhnie/awesome-instruction-datasets
A collection of awesome-prompt-datasets, awesome-instruction-dataset, to... |
|
Emerging |
| 1205 |
awaescher/llmaid
Mass-edit files with LLMs |
|
Emerging |
| 1206 |
swordlidev/Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey |
|
Emerging |
| 1207 |
kyegomez/MHMoE
Community Implementation of the paper: "Multi-Head Mixture-of-Experts" In PyTorch |
|
Emerging |
| 1208 |
YadaYuki/transformer-from-scratch
Transformer from scratch 🙊 (English to Japanese Translator by PyTorch) |
|
Emerging |
| 1209 |
yuanzhoulvpi2017/DocumentSearch
基于sentence transformers和chatglm实现的文档搜索工具 |
|
Emerging |
| 1210 |
tpoisonooo/llama.onnx
LLaMa/RWKV onnx models, quantization and testcase |
|
Emerging |
| 1211 |
dingo-actual/infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind:... |
|
Emerging |
| 1212 |
KolosalAI/kolosal-cli
Super lightweight Ollama + Qwen Code alternative to run Llama 3.3,... |
|
Emerging |
| 1213 |
TIGER-AI-Lab/Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25] |
|
Emerging |
| 1214 |
skit-ai/SpeechLLM
This repository contains the training, inference, evaluation code for... |
|
Emerging |
| 1215 |
neulab/knn-transformers
PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling... |
|
Emerging |
| 1216 |
monologg/KoBigBird
🦅 Pretrained BigBird Model for Korean (up to 4096 tokens) |
|
Emerging |
| 1217 |
ymcui/Chinese-Mixtral
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs) |
|
Emerging |
| 1218 |
QwenLM/Qwen2.5-Math
A series of math-specific large language models of our Qwen2 series. |
|
Emerging |
| 1219 |
GAIR-NLP/ProX
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality... |
|
Emerging |
| 1220 |
ariannamethod/nanollama
Train Llama 3 models from scratch. Any scale, any personality. By Arianna Method. |
|
Emerging |
| 1221 |
JackZeng0208/llama.cpp-android-tutorial
llama.cpp tutorial on Android phone |
|
Emerging |
| 1222 |
FoundationVision/UniTok
[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding |
|
Emerging |
| 1223 |
zeyadusf/LLMs-from-Scratch
Build a Large Language Model (From Scratch) book and Finetuned Models |
|
Emerging |
| 1224 |
AllenXiangX/SnowflakeNet
(TPAMI 2023) Snowflake Point Deconvolution for Point Cloud Completion and... |
|
Emerging |
| 1225 |
VPGTrans/VPGTrans
Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA,... |
|
Emerging |
| 1226 |
ShiZhengyan/DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed... |
|
Emerging |
| 1227 |
amazon-science/unified-ept
A Unified Efficient Pyramid Transformer for Semantic Segmentation, ICCVW 2021 |
|
Emerging |
| 1228 |
torchspec-project/TorchSpec
A PyTorch native library for training speculative decoding models |
|
Emerging |
| 1229 |
google-research/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer |
|
Emerging |
| 1230 |
stanleylsx/llms_tool
一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、D... |
|
Emerging |
| 1231 |
nlpodyssey/cybertron
Cybertron: the home planet of the Transformers in Go |
|
Emerging |
| 1232 |
donaldafeith/Pytorch_Merge
Merge LLM that are split in to parts |
|
Emerging |
| 1233 |
OpenLemur/Lemur
[ICLR 2024] Lemur: Open Foundation Models for Language Agents |
|
Emerging |
| 1234 |
YuweiYin/FinPT
FinPT: Financial Risk Prediction with Profile Tuning on Pretrained Foundation Models |
|
Emerging |
| 1235 |
MetaGLM/FinGLM
FinGLM: 致力于构建一个开放的、公益的、持久的金融大模型项目,利用开源开放来促进「AI+金融」。 |
|
Emerging |
| 1236 |
prajjwal1/fluence
A deep learning library based on Pytorch focussed on low resource language... |
|
Emerging |
| 1237 |
geobrain-ai/geogalactica
Code and datasets for paper "GeoGalactica: A Scientific Large Language Model... |
|
Emerging |
| 1238 |
rafiepour/CTran
Complete code for the proposed CNN-Transformer model for natural language... |
|
Emerging |
| 1239 |
Lupin1998/Awesome-MIM
[Survey] Masked Modeling for Self-supervised Representation Learning on... |
|
Emerging |
| 1240 |
nicola-decao/KnowledgeEditor
Code for Editing Factual Knowledge in Language Models |
|
Emerging |
| 1241 |
HKUDS/GraphEdit
"GraphEdit: Large Language Models for Graph Structure Learning" |
|
Emerging |
| 1242 |
LlamaFamily/Llama-Chinese
Llama中文社区,实时汇总最新Llama学习资料,构建最好的中文Llama大模型开源生态,完全开源可商用 |
|
Emerging |
| 1243 |
declare-lab/red-instruct
Codes and datasets of the paper Red-Teaming Large Language Models using... |
|
Emerging |
| 1244 |
Sea-Snell/CALM-Dialogue
Official code for the paper "Context-Aware Language Modeling for... |
|
Emerging |
| 1245 |
James-QiuHaoran/LLM-serving-with-proxy-models
Efficient Interactive LLM Serving with Proxy Model-based Sequence Length... |
|
Emerging |
| 1246 |
yifanzhang-pro/AutoMathText
[ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative... |
|
Emerging |
| 1247 |
ylsung/VL_adapter
PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for... |
|
Emerging |
| 1248 |
FareedKhan-dev/create-million-parameter-llm-from-scratch
Building a 2.3M-parameter LLM from scratch with LLaMA 1 architecture. |
|
Emerging |
| 1249 |
horseee/Awesome-Efficient-LLM
A curated list for Efficient Large Language Models |
|
Emerging |
| 1250 |
soldni/pyterrier_sentence_transformers
Create PyTerrier compatible dense indices using any sentence_transformers model |
|
Emerging |
| 1251 |
Arkapravo-Ghosh/speech-to-text
Speech to Text Transcription using OpenAI Whisper v3 and FastAPI |
|
Emerging |
| 1252 |
microsoft/COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for... |
|
Emerging |
| 1253 |
Sea-Snell/JAXSeq
Train very large language models in Jax. |
|
Emerging |
| 1254 |
virevolai/logos-shift-client
Replace expensive LLM calls with finetunes automatically |
|
Emerging |
| 1255 |
Geotrend-research/smaller-transformers
Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0. |
|
Emerging |
| 1256 |
hpcaitech/SwiftInfer
Efficient AI Inference & Serving |
|
Emerging |
| 1257 |
NimbleEdge/sparse_transformers
Sparse Inferencing for transformer based LLMs |
|
Emerging |
| 1258 |
MrYxJ/calculate-flops.pytorch
The calflops is designed to calculate FLOPs、MACs and Parameters in all... |
|
Emerging |
| 1259 |
clovaai/length-adaptive-transformer
Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021) |
|
Emerging |
| 1260 |
Mj23978/sam-assistant
🤖 Sam-assistant is a personal assistant that is designed to understand your... |
|
Emerging |
| 1261 |
cloudmercato/ollama-benchmark
Handy tool to measure the performance and efficiency of LLMs workloads. |
|
Emerging |
| 1262 |
NVlabs/Long-RL
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025) |
|
Emerging |
| 1263 |
zalkikar/mlm-bias
Measuring Biases in Masked Language Models for PyTorch Transformers. Support... |
|
Emerging |
| 1264 |
TIGER-AI-Lab/Vamba
Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid... |
|
Emerging |
| 1265 |
FareedKhan-dev/qwen3-MoE-from-scratch
A Step-by-Step Implementation of Qwen 3 MoE Architecture from Scratch |
|
Emerging |
| 1266 |
wellcometrust/WellcomeML
Retired repository for Machine Learning utils at the Wellcome Trust (now deprecated). |
|
Emerging |
| 1267 |
kssteven418/BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder |
|
Emerging |
| 1268 |
robertvacareanu/llm4regression
Examining how large language models (LLMs) perform across various synthetic... |
|
Emerging |
| 1269 |
kyegomez/TeraGPT
Train a production grade GPT in less than 400 lines of code. Better than... |
|
Emerging |
| 1270 |
flowersteam/LLM-Culture
Code for the "Cultural evolution in populations of Large Language Models" paper |
|
Emerging |
| 1271 |
18907305772/FuseAI
FuseAI Project |
|
Emerging |
| 1272 |
sixfingerdev/sixfinger-api
SixFinger API - Free AI Chat Api - 10-20x Faster AI Chat API - Including 10 models. |
|
Emerging |
| 1273 |
knotgrass/attention
several types of attention modules written in PyTorch for learning purposes |
|
Emerging |
| 1274 |
akshitac8/OW-DETR
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer |
|
Emerging |
| 1275 |
wpeebles/G.pt
Official PyTorch Implementation of "Learning to Learn with Generative Models... |
|
Emerging |
| 1276 |
datawhalechina/llm-deploy
大模型/LLM推理和部署理论与实践 |
|
Emerging |
| 1277 |
rishub-tamirisa/tamper-resistance
[ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for... |
|
Emerging |
| 1278 |
zhenyi4/codi
Official repository for "CODI: Compressing Chain-of-Thought into Continuous... |
|
Emerging |
| 1279 |
xiuqhou/Relation-DETR
[ECCV2024 Oral] Official implementation of the paper "Relation DETR:... |
|
Emerging |
| 1280 |
hitz-zentroa/GoLLIE
Guideline following Large Language Model for Information Extraction |
|
Emerging |
| 1281 |
lqzxt/Time-R1
Time-R1 is a two-stage reinforcement fine-tuning framework that trains large... |
|
Emerging |
| 1282 |
qizekun/ShapeLLM
[ECCV 2024] ShapeLLM: Universal 3D Object Understanding for Embodied Interaction |
|
Emerging |
| 1283 |
gtausa197-svg/-Project-Nord-Spiking-Neural-Network-Language-Model
The first pure SNN language model trained from scratch with a fully original... |
|
Emerging |
| 1284 |
SolomonB14D3/knowledge-fidelity
Behavioral auditing & repair toolkit for LLMs. Measures 8 dimensions via... |
|
Emerging |
| 1285 |
kehanlu/DeSTA2
Code and model for ICASSP 2025 Paper "Developing Instruction-Following... |
|
Emerging |
| 1286 |
rasbt/dora-from-scratch
LoRA and DoRA from Scratch Implementations |
|
Emerging |
| 1287 |
ramonclaudio/perplexity-ai-toolkit
A lightweight Python API wrapper and CLI for Perplexity’s Sonar language models. |
|
Emerging |
| 1288 |
shm007g/LLaMA-Cult-and-More
Large Language Models for All, 🦙 Cult and More, Stay in touch ! |
|
Emerging |
| 1289 |
EricLBuehler/xlora
X-LoRA: Mixture of LoRA Experts |
|
Emerging |
| 1290 |
sugarme/transformer
NLP transformers written in Go |
|
Emerging |
| 1291 |
pranavkumaarofficial/nlcli-wizard
Natural language control for Python CLI tools using locally-trained SLMs... |
|
Emerging |
| 1292 |
g8a9/ferret
A python package for benchmarking interpretability techniques on Transformers. |
|
Emerging |
| 1293 |
belladoreai/llama-tokenizer-js
JS tokenizer for LLaMA 1 and 2 |
|
Emerging |
| 1294 |
pogzyb/tourist
Open-source, LLM-ready SERP and web scraping service |
|
Emerging |
| 1295 |
kastalimohammed1965/CLIP-fine-tune-registers-gated
Vision Transformers Needs Registers. And Gated MLPs. And +20M params. Tiny... |
|
Emerging |
| 1296 |
GAIR-NLP/OctoThinker
Revisiting Mid-training in the Era of Reinforcement Learning Scaling |
|
Emerging |
| 1297 |
neuralwork/instruct-finetune-mistral
Fine-tune Mistral 7B to generate fashion style suggestions |
|
Emerging |
| 1298 |
czg1225/dParallel
[ICLR 2026] dParallel: Learnable Parallel Decoding for dLLMs |
|
Emerging |
| 1299 |
THUDM/LongCite
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA |
|
Emerging |
| 1300 |
Esmail-ibraheem/nanograd
nanograd🧠 ML/DL and neural net ecosystem, run models like GPT, llama, stable... |
|
Emerging |