NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

/ 100

Verified

Provides framework-agnostic C++ kernels and Python APIs (PyTorch, JAX/Flax) with automatic scaling factor management for FP8 training, eliminating manual quantization overhead. Includes fused Transformer building blocks (Linear, LayerNorm, Attention) that internally handle dynamic scaling and recipe-based precision selection (e.g., delayed scaling, hybrid formats). Integrates with major LLM frameworks including NeMo, Hugging Face, and DeepL's training pipelines across Ampere and newer GPU architectures.

3,206 stars. Actively maintained with 65 commits in the last 30 days.

No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 25 / 25

How are scores calculated?

Stars

3,206

Forks

659

Language

Python

License

Apache-2.0

Related frameworks

mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

datamade/usaddress

:us: a python library for parsing unstructured United States address strings into address components

GRAAL-Research/deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

mlcommons/training

Reference implementations of MLPerf® training benchmarks

mlcommons/storage

MLPerf® Storage Benchmark Suite

Explore ML Frameworks

All categories Trending ML Framework directory Insights