Llm Cuda Optimization Transformer Models

There are 15 llm cuda optimization models tracked. 2 score above 50 (established tier). The highest-rated is quic/efficient-transformers at 61/100 with 87 stars.

Get all 15 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-cuda-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Model Score Tier
1 quic/efficient-transformers

This library empowers users to seamlessly port pretrained models and...

61
Established
2 ManuelSLemos/RabbitLLM

Run 70B+ LLMs on a single 4GB GPU — no quantization required.

57
Established
3 alpa-projects/alpa

Training and serving large-scale neural networks with auto parallelization.

47
Emerging
4 deepreinforce-ai/CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through...

46
Emerging
5 arm-education/Advanced-AI-Hardware-Software-Co-Design

Hands-on course materials for ML engineers to master extreme model...

45
Emerging
6 IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up...

43
Emerging
7 eqimp/hogwild_llm

Official PyTorch implementation for Hogwild! Inference: Parallel LLM...

37
Emerging
8 AutonomicPerfectionist/PipeInfer

PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation

36
Emerging
9 smvorwerk/xlstm-cuda

Cuda implementation of Extended Long Short Term Memory (xLSTM) with C++ and...

32
Emerging
10 UIC-InDeXLab/RSR

An Efficient Matrix Multiplication Algorithm for Accelerating Inference in...

28
Experimental
11 CodingPlatelets/transformer_MM

Accelerator for LLM Based on Chisel3

26
Experimental
12 Bruce-Lee-LY/cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

25
Experimental
13 JIA-Lab-research/Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration...

25
Experimental
14 rockyco/estFreqOffset

LLM-Assisted FPGA Design for Carrier Frequency Offset Estimation

23
Experimental
15 ccs96307/fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding,...

15
Experimental