omlx and vllm-mlx

These are competitors offering overlapping inference server capabilities (continuous batching, Apple Silicon optimization via MLX) with different feature trade-offs—omlx emphasizes macOS integration while vllm-mlx prioritizes OpenAI API compatibility and multimodal model support.

omlx

Established

vllm-mlx

Established

Maintenance 25/25

Adoption 10/25

Maturity 11/25

Community 19/25

Maintenance 25/25

Adoption 10/25

Maturity 5/25

Community 21/25

Stars: 4,057

Forks: 306

Downloads: —

Commits (30d): 539

Language: Python

License: Apache-2.0

Stars: 579

Forks: 87

Downloads: —

Commits (30d): 59

Language: Python

License: —

No Package No Dependents

No License No Package No Dependents

About omlx

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Supports multi-model serving with automatic LRU eviction and manual pinning, alongside vision-language models and embedding/reranker inference—all via OpenAI-compatible API endpoints. KV cache persists across hot (RAM) and cold (SSD) tiers using block-based management with prefix sharing, restoring cached context from disk on subsequent requests even after server restarts. Includes built-in web dashboard for real-time monitoring, per-model configuration (sampling, TTL, aliases), and direct chat interface, with MCP (Model Context Protocol) support for tool integration.

About vllm-mlx

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

Related comparisons

omlx and asiai

Scores updated daily from GitHub, PyPI, and npm data. How scores work