waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
This project helps developers and engineers working with AI applications to run large language models and vision-language models on their Apple Silicon Macs much faster. It takes various inputs like text, images, videos, or audio, processes them using different AI models, and produces outputs such as generated text, image descriptions, audio transcriptions, or embeddings. It's designed for anyone building or experimenting with AI solutions who needs to deploy models locally on Apple hardware.
579 stars. Actively maintained with 113 commits in the last 30 days.
Use this if you are a developer or AI engineer building applications that use large language models or multimodal AI and want to run them efficiently and quickly on your Apple Silicon Mac.
Not ideal if you don't have an Apple Silicon Mac, or if you're a casual user looking for a pre-packaged consumer application rather than a developer tool.
Stars
579
Forks
87
Language
Python
License
—
Category
Last pushed
Mar 12, 2026
Commits (30d)
113
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/waybarrios/vllm-mlx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
Blaizzy/mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...
b4rtaz/distributed-llama
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM...
petrukha-ivan/mlx-swift-structured
Structured output generation in Swift
armbues/SiLLM
SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...
microsoft/batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.