NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Leverages CUDA and Tensor Core acceleration across Volta/Turing/Ampere GPUs to optimize encoder-decoder inference, supporting mixed-precision execution (FP16, INT8, FP8) and structured sparsity. Provides native integrations with TensorFlow, PyTorch, and Triton, plus C++ APIs for custom deployment. Enables distributed inference through tensor and pipeline parallelism for scaling large models like GPT, T5, and BLOOM across multiple GPUs.
6,398 stars. No commits in the last 6 months.
Stars
6,398
Forks
930
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 27, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NVIDIA/FasterTransformer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
huggingface/transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in...
kyegomez/LongNet
Implementation of plug in and play Attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens"
pbloem/former
Simple transformer implementation from scratch in pytorch. (archival, latest version on codeberg)
ARM-software/keyword-transformer
Official implementation of the Keyword Transformer: https://arxiv.org/abs/2104.00769
IBM/regression-transformer
Regression Transformer (2023; Nature Machine Intelligence)