vllm and xllm
Maintenance
25/25
Adoption
25/25
Maturity
25/25
Community
25/25
Maintenance
25/25
Adoption
10/25
Maturity
15/25
Community
22/25
Stars: 73,007
Forks: 14,312
Downloads: 7,953,905
Commits (30d): 996
Language: Python
License: Apache-2.0
Stars: 1,081
Forks: 149
Downloads: —
Commits (30d): 136
Language: C++
License: —
No risk flags
No Package
No Dependents
About vllm
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Implements PagedAttention for efficient KV cache management and continuous request batching to maximize GPU utilization. Supports multiple quantization schemes (GPTQ, AWQ, INT4/8, FP8), speculative decoding, and tensor/pipeline parallelism across NVIDIA, AMD, Intel, and TPU hardware. Provides OpenAI-compatible API endpoints and integrates directly with Hugging Face models, including multi-modal and mixture-of-expert architectures.
About xllm
jd-opensource/xllm
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
Related comparisons
Scores updated daily from GitHub, PyPI, and npm data. How scores work