waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
579 stars. Actively maintained with 59 commits in the last 30 days.
Stars
579
Forks
87
Language
Python
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
59
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/waybarrios/vllm-mlx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related tools
jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...
jordanhubbard/nanolang
A tiny experimental language designed to be targeted by coding LLMs
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides an interface...
akivasolutions/tightwad
Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...
petrukha-ivan/mlx-swift-structured
Structured output generation in Swift