Transformer Architecture Education Transformer Models

There are 31 transformer architecture education models tracked. 1 score above 70 (verified tier). The highest-rated is huggingface/transformers at 100/100 with 157,811 stars and 126,779,252 monthly downloads. 1 of the top 10 are actively maintained.

Get all 31 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=transformer-architecture-education&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine...

100
Verified
2 NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

48
Emerging
3 kyegomez/LongNet

Implementation of plug in and play Attention from "LongNet: Scaling...

44
Emerging
4 pbloem/former

Simple transformer implementation from scratch in pytorch. (archival, latest...

42
Emerging
5 kyegomez/SimplifiedTransformers

SimplifiedTransformer simplifies transformer block without affecting...

40
Emerging
6 ARM-software/keyword-transformer

Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769

40
Emerging
7 ChangwenXu98/TransPolymer

Implementation of "TransPolymer: a Transformer-based language model for...

38
Emerging
8 IBM/regression-transformer

Regression Transformer (2023; Nature Machine Intelligence)

38
Emerging
9 bytedance/effective_transformer

Running BERT without Padding

37
Emerging
10 bayesgroup/code_transformers

Empirical Study of Transformers for Source Code & A Simple Approach for...

36
Emerging
11 ShivamRajSharma/Transformer-Architectures-From-Scratch

Implementation of transformers based architecture in PyTorch.

36
Emerging
12 Breeze648/Transformer-from-Scratch

本仓库定位为 AI论文复现 / 从零实现 Transformer。 ...

35
Emerging
13 octanove/shiba

Pytorch implementation and pre-trained Japanese model for CANINE, the...

34
Emerging
14 dashstander/block-recurrent-transformer

Pytorch implementation of "Block Recurrent Transformers" (Hutchins & Schlag...

34
Emerging
15 YadaYuki/transformer-from-scratch

Transformer from scratch 🙊 (English to Japanese Translator by PyTorch)

33
Emerging
16 dcaffo98/transpormer

TranSPormer: a transformer for the Travelling Salesman Problem

31
Emerging
17 Whiax/BERT-Transformer-Pytorch

Basic implementation of BERT and Transformer in Pytorch in one short python...

31
Emerging
18 pmichel31415/are-16-heads-really-better-than-1

Code for the paper "Are Sixteen Heads Really Better than One?"

31
Emerging
19 amazon-science/transformers-data-augmentation

Code associated with the "Data Augmentation using Pre-trained Transformer...

30
Emerging
20 THUDM/Multilingual-GLM

The multilingual variant of GLM, a general language model trained with...

28
Experimental
21 nanowell/Differential-Transformer-PyTorch

PyTorch implementation of the Differential-Transformer architecture for...

27
Experimental
22 forgi86/sysid-transformers

Code to reproduce the results of the paper In-context learning for...

27
Experimental
23 SauravP97/toy-transformer

A decoder only Transformer implementing masked attention

24
Experimental
24 IParraMartin/An-Explanation-Is-All-You-Need

The original transformer implementation from scratch. It contains...

24
Experimental
25 fabienfrfr/tptt

😊 TPTT: Transforming Pretrained Transformers into Titans

23
Experimental
26 LoserCheems/WonderfulMatrices

Wonderful Matrices to Build Small Language Models

22
Experimental
27 kyegomez/MLXTransformer

Simple Implementation of a Transformer in the new framework MLX by Apple

20
Experimental
28 januverma/transformers-stuff

Codes, scripts, and notebooks on various aspects of transformer models.

20
Experimental
29 BruinGrowly/URI_Transformer

URI-Transformer: Universal Reality Interface - A revolutionary artificial...

20
Experimental
30 daniel-furman/polyglot-or-not

Are foundation LMs multilingual knowledge bases? (EMNLP 2023)

15
Experimental
31 kyegomez/HeptapodLM

An Implementation of an Transformer model that generates tokens non-linearly...

14
Experimental