waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

/ 100

Established

This project helps developers and engineers working with AI applications to run large language models and vision-language models on their Apple Silicon Macs much faster. It takes various inputs like text, images, videos, or audio, processes them using different AI models, and produces outputs such as generated text, image descriptions, audio transcriptions, or embeddings. It's designed for anyone building or experimenting with AI solutions who needs to deploy models locally on Apple hardware.

579 stars. Actively maintained with 113 commits in the last 30 days.

Use this if you are a developer or AI engineer building applications that use large language models or multimodal AI and want to run them efficiently and quickly on your Apple Silicon Mac.

Not ideal if you don't have an Apple Silicon Mac, or if you're a casual user looking for a pre-packaged consumer application rather than a developer tool.

AI-development machine-learning-engineering LLM-deployment multimodal-AI Apple-Silicon-optimization

No License No Package No Dependents

Maintenance 22 / 25

Adoption 10 / 25

Maturity 5 / 25

Community 21 / 25

How are scores calculated?

Stars

579

Forks

Language

Python

License

—

Compare

vllm-mlx and mlx-vlm vllm-mlx and Local_LLM_Training_Apple_Silicon vllm-mlx and mlx-flash

Related models

Blaizzy/mlx-vlm

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac...

b4rtaz/distributed-llama

Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM...

petrukha-ivan/mlx-swift-structured

Structured output generation in Swift

armbues/SiLLM

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple...

microsoft/batch-inference

Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.

Explore Transformer Models

All categories Trending Transformer directory Insights