vllm-project/vllm-omni
A framework for efficient model inference with omni-modality models
Extends vLLM's text-focused architecture to handle any-to-any multimodal inference—text, image, video, and audio—with support for non-autoregressive models like Diffusion Transformers alongside traditional autoregressive generation. Uses fully disaggregated pipeline execution with OmniConnector and dynamic resource allocation across heterogeneous stages, enabling pipelined overlapping for high throughput. Provides tensor/pipeline/data/expert parallelism, OpenAI-compatible serving APIs, and integrates seamlessly with Hugging Face models including Qwen-Omni and multimodal generation variants.
3,197 stars and 16,988 monthly downloads. Actively maintained with 353 commits in the last 30 days. Available on PyPI.
Stars
3,197
Forks
550
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 18, 2026
Monthly downloads
16,988
Commits (30d)
353
Dependencies
18
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vllm-project/vllm-omni"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.