NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Provides framework-agnostic C++ kernels and Python APIs (PyTorch, JAX/Flax) with automatic scaling factor management for FP8 training, eliminating manual quantization overhead. Includes fused Transformer building blocks (Linear, LayerNorm, Attention) that internally handle dynamic scaling and recipe-based precision selection (e.g., delayed scaling, hybrid formats). Integrates with major LLM frameworks including NeMo, Hugging Face, and DeepL's training pipelines across Ampere and newer GPU architectures.
3,206 stars. Actively maintained with 65 commits in the last 30 days.
Stars
3,206
Forks
659
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 12, 2026
Commits (30d)
65
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/NVIDIA/TransformerEngine"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning
mlcommons/training
Reference implementations of MLPerf® training benchmarks
mlcommons/storage
MLPerf® Storage Benchmark Suite