Apple Silicon LLM Inference LLM Tools

Tools and frameworks for optimizing LLM inference, training, and deployment specifically on Apple Silicon (M1/M2/M3) using MLX framework. Includes server implementations, UI wrappers, and performance optimization utilities. Does NOT include general LLM frameworks, non-Apple-specific inference servers, or tools without native MLX/Metal support.

There are 55 apple silicon llm inference tools tracked. 5 score above 50 (established tier). The highest-rated is jundot/omlx at 65/100 with 4,057 stars. 4 of the top 10 are actively maintained.

Get all 55 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=apple-silicon-llm-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Tool	Score	Tier	Stars	Language
1	jundot/omlx LLM inference server with continuous batching & SSD caching for Apple...	65	Established	4,057	Python
2	josStorer/RWKV-Runner A RWKV management and startup tool, full automation, only 8MB. And provides...	59	Established	6,256	TypeScript
3	waybarrios/vllm-mlx OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and...	57	Established	579	Python
4	jordanhubbard/nanolang A tiny experimental language designed to be targeted by coding LLMs	55	Established	573	C
5	akivasolutions/tightwad Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative...	52	Established	4	Python
6	petrukha-ivan/mlx-swift-structured Structured output generation in Swift	45	Emerging	65	Swift
7	parasail-ai/openai-batch Make OpenAI batch easy to use.	41	Emerging	9	Python
8	uncSoft/anubis-oss Local LLM Testing & Benchmarking for Apple Silicon	38	Emerging	68	Swift
9	mit-han-lab/TinyChatEngine TinyChatEngine: On-Device LLM Inference Library	38	Emerging	944	C++
10	eelbaz/dgx-spark-vllm-setup One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs...	37	Emerging	71	Shell
11	da-z/mlx-ui A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.	36	Emerging	262	Python
12	icppWorld/icgpt on-chain LLMs for the Internet Computer	34	Emerging	17	Python
13	OpenLMLab/MOSS_Vortex Moss Vortex is a lightweight and high-performance deployment and inference...	33	Emerging	37	Python
14	druide67/asiai Multi-engine LLM benchmark & monitoring CLI for Apple Silicon	33	Emerging	2	Python
15	Sub-Soft/Siliv MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation	32	Emerging	253	Python
16	N1k1tung/infer-ring Infer Ring is an iOS and macOS app that facilitates cross-device LLM...	32	Emerging	9	Swift
17	altunenes/calcarine Desktop VLM: Real-time FastVLM analysis of video & textures with live compute shaders	31	Emerging	5	Rust
18	makit/makit-llm-lambda Example showing how to run a LLM fully inside an AWS Lambda Function	29	Experimental	23	Dockerfile
19	seasonjs/rwkv pure go for rwkv	27	Experimental	19	Go
20	Mattbusel/llm-cpp The C++ LLM toolkit. 26 single-header libraries for streaming, caching, cost...	26	Experimental	8	—
21	Mizistein/omlx 🤖 Optimize LLM inference on Mac with continuous batching and SSD caching...	26	Experimental	5	Python
22	jranaraki/vllm-fit A CLI tool designed to simply recommend (conservative), and/or profile (to...	26	Experimental	6	Python
23	unit-mesh/edge-infer EdgeInfer enables efficient edge intelligence by running small AI models,...	24	Experimental	50	Rust
24	ziozzang/Mac_mlx_phi-2_server Test server code for Phi-2 model. support OpenAI API spec	24	Experimental	18	Python
25	SemiAnalysisAI/InferenceX-app Dashboard for InferenceX™, Open Source Continuous Inference	24	Experimental	2	TypeScript
26	AI-DarwinLabs/vllm-hpc-installer 🚀 Automated installation script for vLLM on HPC systems with ROCm support,...	23	Experimental	2	Shell
27	unisa-hpc/llm.sycl The sycl version of llm.c (for the final project of HPC course 2024, UNISA)	23	Experimental	2	C++
28	GusLovesMath/Local_LLM_Training_Apple_Silicon Created and enhanced a local LLM training system on Apple Silicon with MLX...	23	Experimental	26	Python
29	ndluna21/nanochat-ascend Run nanochat training efficiently on Huawei Ascend NPUs with minimal code...	22	Experimental	—	Python
30	vivekptnk/tinybrain Swift-native on-device LLM inference with live transformer visualization (X-Ray Mode)	22	Experimental	—	Swift
31	fabriziosalmi/silicondev Local LLM fine-tuning and chat for Apple Silicon	22	Experimental	—	TypeScript
32	deeflect/mcclaw Find which local LLMs actually run on your Mac. 340+ models, hardware-aware...	19	Experimental	13	—
33	jeorgexyz/lua-llama Pure Lua implementation of LLaMA inference - educational project exploring...	19	Experimental	—	Lua
34	arunsanna/tauri-plugin-mlx Tauri v2 plugin for local LLM inference on Apple Silicon using Apple MLX —...	19	Experimental	—	Rust
35	jballo/VALLM VALLM (Vision Assisted Large Language Model) is a web application that helps...	19	Experimental	—	TypeScript
36	koji/llm_api_template API template for LLM model with llama.cpp	17	Experimental	4	Jupyter Notebook
37	Feyerabend/cc From Code to Computation: A Modern Guide to Programming and Theory	17	Experimental	3	C
38	leszkolukasz/moondream-cpp Moondream VLLM for C++/Qt	17	Experimental	3	C++
39	1amageek/swift-lm Hugging Face native LLM inference on Apple Silicon via direct Metal	16	Experimental	2	Swift
40	fiveoutofnine/whatcanirun Find the best models and how to run them locally.	16	Experimental	2	TypeScript
41	adityonugrohoid/vllm-explorer Probes and catalogs the full vLLM server API — endpoint reference, model...	14	Experimental	—	Python
42	StefanoChiodino/mlx-manager Sugar coating on the extremely performant but not very user friendly MLX	14	Experimental	—	Swift
43	GabrielNetoAUT/tps.sh Benchmark local and cloud large language models on Apple Silicon by...	14	Experimental	—	Python
44	dev4any1/hyper-stack-4j Distributed Java-native LLM Inference Engine — commodity CPU/GPU cluster	14	Experimental	—	Java
45	countzero/windows_manage_large_language_models PowerShell automation to download large language models (LLMs) from Git...	14	Experimental	3	PowerShell
46	WilliamK112/llm-fit Can my laptop run this model? Instant local LLM fit + speed estimator.	14	Experimental	—	JavaScript
47	amanparuthi8/gpu-llm-india-2026 Should you buy a DGX Spark or rent H100s? Run on Mac Mini or TAALAS cluster?...	14	Experimental	—	HTML
48	GetNyrex/strix-halo-guide Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87...	14	Experimental	—	Shell
49	mspronesti/llm.sycl llm.c, but in SYCL/Intel oneAPI!	13	Experimental	8	C++
50	javi22020/batch-router Batch LLM inference Python library	11	Experimental	—	Python
51	DunaSpice/JetsonMind Production-ready AI inference system for NVIDIA Jetson devices with MCP...	11	Experimental	—	TypeScript
52	echenim/hf-batch-downloader Automate bulk downloads of Hugging Face LLMs with retry logic, manifest...	11	Experimental	—	Python
53	vaccovecrana/rwkv.jni JNI wrapper for rwkv.cpp	11	Experimental	2	Java
54	tonoy30/Llama Llama-2 on apple mac using gpu	11	Experimental	2	Jupyter Notebook
55	TheseusInstitute/nix-exllama Nix derivation for EXLlama	10	Experimental	1	Nix

Comparisons in this category

omlx and vllm-mlx (65 vs 57) omlx and asiai (65 vs 33)