vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

93
/ 100
Verified

Extends vLLM's text-focused architecture to handle any-to-any multimodal inference—text, image, video, and audio—with support for non-autoregressive models like Diffusion Transformers alongside traditional autoregressive generation. Uses fully disaggregated pipeline execution with OmniConnector and dynamic resource allocation across heterogeneous stages, enabling pipelined overlapping for high throughput. Provides tensor/pipeline/data/expert parallelism, OpenAI-compatible serving APIs, and integrates seamlessly with Hugging Face models including Qwen-Omni and multimodal generation variants.

3,197 stars and 16,988 monthly downloads. Actively maintained with 353 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25
Adoption 20 / 25
Maturity 24 / 25
Community 24 / 25

How are scores calculated?

Stars

3,197

Forks

550

Language

Python

License

Apache-2.0

Last pushed

Mar 18, 2026

Monthly downloads

16,988

Commits (30d)

353

Dependencies

18

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vllm-project/vllm-omni"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.