BenChaliah/NVFP4-on-4090-vLLM
AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVFP4 weights and keeps the entire decode path in FP8
Stars
98
Forks
3
Language
Python
License
—
Category
Last pushed
Feb 15, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/BenChaliah/NVFP4-on-4090-vLLM"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...