NVIDIA/cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

70
/ 100
Verified

This project provides specialized tools for developers to create highly optimized linear algebra operations, particularly for matrix-matrix multiplication (GEMM), on NVIDIA GPUs. It takes in computational definitions and data types, and outputs high-performance CUDA kernels. Researchers, performance engineers, and students working on GPU programming for numerical applications would find this useful.

9,426 stars. Actively maintained with 9 commits in the last 30 days.

Use this if you need to develop custom, extremely fast GPU kernels for linear algebra, especially matrix multiplications, using a more accessible Python interface or traditional C++ templates.

Not ideal if you are an end-user simply looking to run existing machine learning models or use standard data science libraries without writing custom GPU code.

GPU programming High-performance computing Numerical optimization Deep learning infrastructure CUDA kernel development
No Package No Dependents
Maintenance 20 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 24 / 25

How are scores calculated?

Stars

9,426

Forks

1,725

Language

C++

License

Last pushed

Mar 12, 2026

Commits (30d)

9

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/NVIDIA/cutlass"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.