Apple Silicon LLM Inference LLM Tools
Tools and frameworks for optimizing LLM inference, training, and deployment specifically on Apple Silicon (M1/M2/M3) using MLX framework. Includes server implementations, UI wrappers, and performance optimization utilities. Does NOT include general LLM frameworks, non-Apple-specific inference servers, or tools without native MLX/Metal support.
There are 55 apple silicon llm inference tools tracked. 5 score above 50 (established tier). The highest-rated is jundot/omlx at 65/100 with 4,057 stars. 4 of the top 10 are actively maintained.
Get all 55 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=apple-silicon-llm-inference&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Tool | Score | Tier |
|---|---|---|---|
| 1 |
jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple... |
|
Established |
| 2 |
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides... |
|
Established |
| 3 |
waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and... |
|
Established |
| 4 |
jordanhubbard/nanolang
A tiny experimental language designed to be targeted by coding LLMs |
|
Established |
| 5 |
akivasolutions/tightwad
Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative... |
|
Established |
| 6 |
petrukha-ivan/mlx-swift-structured
Structured output generation in Swift |
|
Emerging |
| 7 |
parasail-ai/openai-batch
Make OpenAI batch easy to use. |
|
Emerging |
| 8 |
uncSoft/anubis-oss
Local LLM Testing & Benchmarking for Apple Silicon |
|
Emerging |
| 9 |
mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library |
|
Emerging |
| 10 |
eelbaz/dgx-spark-vllm-setup
One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs... |
|
Emerging |
| 11 |
da-z/mlx-ui
A simple UI / Web / Frontend for MLX mlx-lm using Streamlit. |
|
Emerging |
| 12 |
icppWorld/icgpt
on-chain LLMs for the Internet Computer |
|
Emerging |
| 13 |
OpenLMLab/MOSS_Vortex
Moss Vortex is a lightweight and high-performance deployment and inference... |
|
Emerging |
| 14 |
druide67/asiai
Multi-engine LLM benchmark & monitoring CLI for Apple Silicon |
|
Emerging |
| 15 |
Sub-Soft/Siliv
MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation |
|
Emerging |
| 16 |
N1k1tung/infer-ring
Infer Ring is an iOS and macOS app that facilitates cross-device LLM... |
|
Emerging |
| 17 |
altunenes/calcarine
Desktop VLM: Real-time FastVLM analysis of video & textures with live compute shaders |
|
Emerging |
| 18 |
makit/makit-llm-lambda
Example showing how to run a LLM fully inside an AWS Lambda Function |
|
Experimental |
| 19 |
seasonjs/rwkv
pure go for rwkv |
|
Experimental |
| 20 |
Mattbusel/llm-cpp
The C++ LLM toolkit. 26 single-header libraries for streaming, caching, cost... |
|
Experimental |
| 21 |
Mizistein/omlx
🤖 Optimize LLM inference on Mac with continuous batching and SSD caching... |
|
Experimental |
| 22 |
jranaraki/vllm-fit
A CLI tool designed to simply recommend (conservative), and/or profile (to... |
|
Experimental |
| 23 |
unit-mesh/edge-infer
EdgeInfer enables efficient edge intelligence by running small AI models,... |
|
Experimental |
| 24 |
ziozzang/Mac_mlx_phi-2_server
Test server code for Phi-2 model. support OpenAI API spec |
|
Experimental |
| 25 |
SemiAnalysisAI/InferenceX-app
Dashboard for InferenceX™, Open Source Continuous Inference |
|
Experimental |
| 26 |
AI-DarwinLabs/vllm-hpc-installer
🚀 Automated installation script for vLLM on HPC systems with ROCm support,... |
|
Experimental |
| 27 |
unisa-hpc/llm.sycl
The sycl version of llm.c (for the final project of HPC course 2024, UNISA) |
|
Experimental |
| 28 |
GusLovesMath/Local_LLM_Training_Apple_Silicon
Created and enhanced a local LLM training system on Apple Silicon with MLX... |
|
Experimental |
| 29 |
ndluna21/nanochat-ascend
Run nanochat training efficiently on Huawei Ascend NPUs with minimal code... |
|
Experimental |
| 30 |
vivekptnk/tinybrain
Swift-native on-device LLM inference with live transformer visualization (X-Ray Mode) |
|
Experimental |
| 31 |
fabriziosalmi/silicondev
Local LLM fine-tuning and chat for Apple Silicon |
|
Experimental |
| 32 |
deeflect/mcclaw
Find which local LLMs actually run on your Mac. 340+ models, hardware-aware... |
|
Experimental |
| 33 |
jeorgexyz/lua-llama
Pure Lua implementation of LLaMA inference - educational project exploring... |
|
Experimental |
| 34 |
arunsanna/tauri-plugin-mlx
Tauri v2 plugin for local LLM inference on Apple Silicon using Apple MLX —... |
|
Experimental |
| 35 |
jballo/VALLM
VALLM (Vision Assisted Large Language Model) is a web application that helps... |
|
Experimental |
| 36 |
koji/llm_api_template
API template for LLM model with llama.cpp |
|
Experimental |
| 37 |
Feyerabend/cc
From Code to Computation: A Modern Guide to Programming and Theory |
|
Experimental |
| 38 |
leszkolukasz/moondream-cpp
Moondream VLLM for C++/Qt |
|
Experimental |
| 39 |
1amageek/swift-lm
Hugging Face native LLM inference on Apple Silicon via direct Metal |
|
Experimental |
| 40 |
fiveoutofnine/whatcanirun
Find the best models and how to run them locally. |
|
Experimental |
| 41 |
adityonugrohoid/vllm-explorer
Probes and catalogs the full vLLM server API — endpoint reference, model... |
|
Experimental |
| 42 |
StefanoChiodino/mlx-manager
Sugar coating on the extremely performant but not very user friendly MLX |
|
Experimental |
| 43 |
GabrielNetoAUT/tps.sh
Benchmark local and cloud large language models on Apple Silicon by... |
|
Experimental |
| 44 |
dev4any1/hyper-stack-4j
Distributed Java-native LLM Inference Engine — commodity CPU/GPU cluster |
|
Experimental |
| 45 |
countzero/windows_manage_large_language_models
PowerShell automation to download large language models (LLMs) from Git... |
|
Experimental |
| 46 |
WilliamK112/llm-fit
Can my laptop run this model? Instant local LLM fit + speed estimator. |
|
Experimental |
| 47 |
amanparuthi8/gpu-llm-india-2026
Should you buy a DGX Spark or rent H100s? Run on Mac Mini or TAALAS cluster?... |
|
Experimental |
| 48 |
GetNyrex/strix-halo-guide
Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87... |
|
Experimental |
| 49 |
mspronesti/llm.sycl
llm.c, but in SYCL/Intel oneAPI! |
|
Experimental |
| 50 |
javi22020/batch-router
Batch LLM inference Python library |
|
Experimental |
| 51 |
DunaSpice/JetsonMind
Production-ready AI inference system for NVIDIA Jetson devices with MCP... |
|
Experimental |
| 52 |
echenim/hf-batch-downloader
Automate bulk downloads of Hugging Face LLMs with retry logic, manifest... |
|
Experimental |
| 53 |
vaccovecrana/rwkv.jni
JNI wrapper for rwkv.cpp |
|
Experimental |
| 54 |
tonoy30/Llama
Llama-2 on apple mac using gpu |
|
Experimental |
| 55 |
TheseusInstitute/nix-exllama
Nix derivation for EXLlama |
|
Experimental |