vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

/ 100

Verified

Extends vLLM's text-focused architecture to handle any-to-any multimodal inference—text, image, video, and audio—with support for non-autoregressive models like Diffusion Transformers alongside traditional autoregressive generation. Uses fully disaggregated pipeline execution with OmniConnector and dynamic resource allocation across heterogeneous stages, enabling pipelined overlapping for high throughput. Provides tensor/pipeline/data/expert parallelism, OpenAI-compatible serving APIs, and integrates seamlessly with Hugging Face models including Qwen-Omni and multimodal generation variants.

3,197 stars and 16,988 monthly downloads. Actively maintained with 353 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25

Adoption 20 / 25

Maturity 24 / 25

Community 24 / 25

How are scores calculated?

Stars

3,197

Forks

550

Language

Python

License

Apache-2.0

Category

video-editing-diffusion

Last pushed

Mar 18, 2026

Monthly downloads

16,988

Commits (30d)

353

Dependencies

GitHub PyPI

Video Editing Diffusion · 1 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/vllm-project/vllm-omni"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Explore Transformer Models

All categories Trending Transformer directory Insights