GeeeekExplorer/nano-vllm

Nano vLLM

53
/ 100
Established

Implements core vLLM optimizations—prefix caching, tensor parallelism, CUDA graphs, and torch compilation—in a minimal ~1,200-line Python codebase. Provides a vLLM-compatible API for fast offline LLM inference with demonstrated throughput matching or exceeding the full vLLM implementation. Designed for educational clarity and efficient deployment on resource-constrained hardware like consumer GPUs.

12,189 stars.

No Package No Dependents
Maintenance 6 / 25
Adoption 10 / 25
Maturity 15 / 25
Community 22 / 25

How are scores calculated?

Stars

12,189

Forks

1,704

Language

Python

License

MIT

Last pushed

Nov 03, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/GeeeekExplorer/nano-vllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.