NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

/ 100

Emerging

Leverages CUDA and Tensor Core acceleration across Volta/Turing/Ampere GPUs to optimize encoder-decoder inference, supporting mixed-precision execution (FP16, INT8, FP8) and structured sparsity. Provides native integrations with TensorFlow, PyTorch, and Triton, plus C++ APIs for custom deployment. Enables distributed inference through tensor and pipeline parallelism for scaling large models like GPT, T5, and BLOOM across multiple GPUs.

6,398 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

6,398

Forks

930

Language

C++

License

Apache-2.0

Higher-rated alternatives

huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in...

100

kyegomez/LongNet

Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"

pbloem/former

Simple transformer implementation from scratch in pytorch. (archival, latest version on codeberg)

ARM-software/keyword-transformer

Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769

IBM/regression-transformer

Regression Transformer (2023; Nature Machine Intelligence)

Explore Transformer Models

All categories Trending Transformer directory Insights