vllm-mlx and mlx-flash

vllm-mlx
58
Established
mlx-flash
28
Experimental
Maintenance 22/25
Adoption 10/25
Maturity 5/25
Community 21/25
Maintenance 13/25
Adoption 6/25
Maturity 9/25
Community 0/25
Stars: 579
Forks: 87
Downloads:
Commits (30d): 113
Language: Python
License:
Stars: 18
Forks:
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No License No Package No Dependents
No Package No Dependents

About vllm-mlx

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

This project helps developers and engineers working with AI applications to run large language models and vision-language models on their Apple Silicon Macs much faster. It takes various inputs like text, images, videos, or audio, processes them using different AI models, and produces outputs such as generated text, image descriptions, audio transcriptions, or embeddings. It's designed for anyone building or experimenting with AI solutions who needs to deploy models locally on Apple hardware.

AI-development machine-learning-engineering LLM-deployment multimodal-AI Apple-Silicon-optimization

About mlx-flash

matt-k-wong/mlx-flash

Lightning-fast MLX utilities and optimizations for Apple Silicon

This project enables you to run very large AI models, like those with tens or hundreds of billions of parameters, directly on your Apple Mac, even if it has limited memory. It takes an existing large language model and efficiently streams its components from your Mac's fast storage, allowing you to get immediate text generation or analysis without needing to shrink or alter the model. This is ideal for AI practitioners, researchers, or developers who want to experiment with or deploy large models locally on their Apple Silicon machines.

large-language-models on-device-ai ai-model-deployment apple-silicon-ml ml-research

Scores updated daily from GitHub, PyPI, and npm data. How scores work