NVIDIA/FasterTransformer

Transformer related optimization, including BERT, GPT

48
/ 100
Emerging

Leverages CUDA and Tensor Core acceleration across Volta/Turing/Ampere GPUs to optimize encoder-decoder inference, supporting mixed-precision execution (FP16, INT8, FP8) and structured sparsity. Provides native integrations with TensorFlow, PyTorch, and Triton, plus C++ APIs for custom deployment. Enables distributed inference through tensor and pipeline parallelism for scaling large models like GPT, T5, and BLOOM across multiple GPUs.

6,398 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 22 / 25

How are scores calculated?

Stars

6,398

Forks

930

Language

C++

License

Apache-2.0

Last pushed

Mar 27, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/NVIDIA/FasterTransformer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.