Llm Cuda Optimization Transformer Models

There are 15 llm cuda optimization models tracked. 2 score above 50 (established tier). The highest-rated is quic/efficient-transformers at 61/100 with 87 stars.

Get all 15 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=transformers&subcategory=llm-cuda-optimization&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Model	Score	Tier	Stars	Language
1	quic/efficient-transformers This library empowers users to seamlessly port pretrained models and...	61	Established	87	Python
2	ManuelSLemos/RabbitLLM Run 70B+ LLMs on a single 4GB GPU — no quantization required.	57	Established	38	Python
3	alpa-projects/alpa Training and serving large-scale neural networks with auto parallelization.	47	Emerging	3,188	Python
4	deepreinforce-ai/CUDA-L2 CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through...	46	Emerging	472	Cuda
5	arm-education/Advanced-AI-Hardware-Software-Co-Design Hands-on course materials for ML engineers to master extreme model...	45	Emerging	34	Jupyter Notebook
6	IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up...	43	Emerging	1,039	Python
7	eqimp/hogwild_llm Official PyTorch implementation for Hogwild! Inference: Parallel LLM...	37	Emerging	140	Python
8	AutonomicPerfectionist/PipeInfer PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation	36	Emerging	32	C++
9	smvorwerk/xlstm-cuda Cuda implementation of Extended Long Short Term Memory (xLSTM) with C++ and...	32	Emerging	91	C++
10	UIC-InDeXLab/RSR An Efficient Matrix Multiplication Algorithm for Accelerating Inference in...	28	Experimental	17	Python
11	CodingPlatelets/transformer_MM Accelerator for LLM Based on Chisel3	26	Experimental	12	Scala
12	Bruce-Lee-LY/cutlass_gemm Multiple GEMM operators are constructed with cutlass to support LLM inference.	25	Experimental	19	C++
13	JIA-Lab-research/Q-LLM This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration...	25	Experimental	55	Python
14	rockyco/estFreqOffset LLM-Assisted FPGA Design for Carrier Frequency Offset Estimation	23	Experimental	10	C++
15	ccs96307/fast-llm-inference Accelerating LLM inference with techniques like speculative decoding,...	15	Experimental	11	Python