NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

76
/ 100
Verified

Provides framework-agnostic C++ kernels and Python APIs (PyTorch, JAX/Flax) with automatic scaling factor management for FP8 training, eliminating manual quantization overhead. Includes fused Transformer building blocks (Linear, LayerNorm, Attention) that internally handle dynamic scaling and recipe-based precision selection (e.g., delayed scaling, hybrid formats). Integrates with major LLM frameworks including NeMo, Hugging Face, and DeepL's training pipelines across Ampere and newer GPU architectures.

3,206 stars. Actively maintained with 65 commits in the last 30 days.

No Package No Dependents
Maintenance 25 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 25 / 25

How are scores calculated?

Stars

3,206

Forks

659

Language

Python

License

Apache-2.0

Last pushed

Mar 12, 2026

Commits (30d)

65

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/NVIDIA/TransformerEngine"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.