LessUp/llm-speed

CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core GEMM with pybind11 Bindings | LLM 推理加速 CUDA Kernel 库：FlashAttention、HGEMM、Tensor Core GEMM，含 pybind11 Python 绑定

/ 100

Experimental

No Package No Dependents

Maintenance 13 / 25

Adoption 0 / 25

Maturity 9 / 25

Community 0 / 25

How are scores calculated?

Stars

—

Forks

—

Language

Python

License

MIT

Category

llm-quantization-techniques

Last pushed

Mar 13, 2026

Commits (30d)

GitHub

LLM Quantization Techniques · 21 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/LessUp/llm-speed"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

huawei-csl/SINQ

Welcome to the official repository of SINQ! A novel, fast and high-quality quantization method...

SILX-LABS/QUASAR-SUBNET

QUASAR is a long-context foundation model and decentralized evaluation subnet built on Bittensor,

stackblogger/bitnet.js

BitNet.Js - A node.js implementation of the microsoft bitnet.cpp inference framework.

AnswerDotAI/cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking...

FMInference/H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

Explore LLM Tools

All categories Trending LLM Tool directory Insights