vllm and gpustack

vLLM is a core inference engine that GPUStack wraps and orchestrates, making them complements—GPUStack provides multi-engine selection and performance tuning on top of vLLM's serving capabilities rather than replacing it.

vllm

100

Verified

gpustack

Verified

Maintenance 25/25

Adoption 25/25

Maturity 25/25

Community 25/25

Maintenance 25/25

Adoption 10/25

Maturity 16/25

Community 20/25

Stars: 73,007

Forks: 14,312

Downloads: 7,953,905

Commits (30d): 996

Language: Python

License: Apache-2.0

Stars: 4,630

Forks: 472

Downloads: —

Commits (30d): 95

Language: Python

License: Apache-2.0

No risk flags

No Package No Dependents

About vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Implements PagedAttention for efficient KV cache management and continuous request batching to maximize GPU utilization. Supports multiple quantization schemes (GPTQ, AWQ, INT4/8, FP8), speculative decoding, and tensor/pipeline parallelism across NVIDIA, AMD, Intel, and TPU hardware. Provides OpenAI-compatible API endpoints and integrates directly with Hugging Face models, including multi-modal and mixture-of-expert architectures.

About gpustack

gpustack/gpustack

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

Related comparisons

vllm and sglang vllm and MNN vllm and inference vllm and LightLLM vllm and xllm vllm and rtp-llm

Scores updated daily from GitHub, PyPI, and npm data. How scores work