waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

/ 100

Established

579 stars. Actively maintained with 59 commits in the last 30 days.

No License No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 5 / 25

Community 21 / 25

How are scores calculated?

Stars

579

Forks

Language

Python

License

—

Category

apple-silicon-llm-inference

Last pushed

Mar 12, 2026

Commits (30d)

GitHub

Apple Silicon LLM Inference · 55 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/waybarrios/vllm-mlx"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Compare

vllm-mlx and omlx

Related tools

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...

jordanhubbard/nanolang

A tiny experimental language designed to be targeted by coding LLMs

josStorer/RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides an interface...

akivasolutions/tightwad

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...

petrukha-ivan/mlx-swift-structured

Structured output generation in Swift

Explore LLM Tools

All categories Trending LLM Tool directory Insights