vllm and xllm

vllm

100

Verified

xllm

Verified

Maintenance 25/25

Adoption 25/25

Maturity 25/25

Community 25/25

Maintenance 25/25

Adoption 10/25

Maturity 15/25

Community 22/25

Stars: 73,007

Forks: 14,312

Downloads: 7,953,905

Commits (30d): 996

Language: Python

License: Apache-2.0

Stars: 1,081

Forks: 149

Downloads: —

Commits (30d): 136

Language: C++

License: —

No risk flags

No Package No Dependents

About vllm

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Implements PagedAttention for efficient KV cache management and continuous request batching to maximize GPU utilization. Supports multiple quantization schemes (GPTQ, AWQ, INT4/8, FP8), speculative decoding, and tensor/pipeline parallelism across NVIDIA, AMD, Intel, and TPU hardware. Provides OpenAI-compatible API endpoints and integrates directly with Hugging Face models, including multi-modal and mixture-of-expert architectures.

About xllm

jd-opensource/xllm

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

Related comparisons

vllm and sglang vllm and MNN vllm and inference vllm and gpustack vllm and LightLLM vllm and rtp-llm

Scores updated daily from GitHub, PyPI, and npm data. How scores work