Apple Silicon LLM Inference LLM Tools

Tools and frameworks for optimizing LLM inference, training, and deployment specifically on Apple Silicon (M1/M2/M3) using MLX framework. Includes server implementations, UI wrappers, and performance optimization utilities. Does NOT include general LLM frameworks, non-Apple-specific inference servers, or tools without native MLX/Metal support.

There are 55 apple silicon llm inference tools tracked. 5 score above 50 (established tier). The highest-rated is jundot/omlx at 65/100 with 4,057 stars. 4 of the top 10 are actively maintained.

Get all 55 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=llm-tools&subcategory=apple-silicon-llm-inference&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple...

65
Established
2 josStorer/RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides...

59
Established
3 waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and...

57
Established
4 jordanhubbard/nanolang

A tiny experimental language designed to be targeted by coding LLMs

55
Established
5 akivasolutions/tightwad

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative...

52
Established
6 petrukha-ivan/mlx-swift-structured

Structured output generation in Swift

45
Emerging
7 parasail-ai/openai-batch

Make OpenAI batch easy to use.

41
Emerging
8 uncSoft/anubis-oss

Local LLM Testing & Benchmarking for Apple Silicon

38
Emerging
9 mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

38
Emerging
10 eelbaz/dgx-spark-vllm-setup

One-command vLLM installation for NVIDIA DGX Spark with Blackwell GB10 GPUs...

37
Emerging
11 da-z/mlx-ui

A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.

36
Emerging
12 icppWorld/icgpt

on-chain LLMs for the Internet Computer

34
Emerging
13 OpenLMLab/MOSS_Vortex

Moss Vortex is a lightweight and high-performance deployment and inference...

33
Emerging
14 druide67/asiai

Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

33
Emerging
15 Sub-Soft/Siliv

MacOS menu‑bar utility to adjust Apple Silicon GPU VRAM allocation

32
Emerging
16 N1k1tung/infer-ring

Infer Ring is an iOS and macOS app that facilitates cross-device LLM...

32
Emerging
17 altunenes/calcarine

Desktop VLM: Real-time FastVLM analysis of video & textures with live compute shaders

31
Emerging
18 makit/makit-llm-lambda

Example showing how to run a LLM fully inside an AWS Lambda Function

29
Experimental
19 seasonjs/rwkv

pure go for rwkv

27
Experimental
20 Mattbusel/llm-cpp

The C++ LLM toolkit. 26 single-header libraries for streaming, caching, cost...

26
Experimental
21 Mizistein/omlx

🤖 Optimize LLM inference on Mac with continuous batching and SSD caching...

26
Experimental
22 jranaraki/vllm-fit

A CLI tool designed to simply recommend (conservative), and/or profile (to...

26
Experimental
23 unit-mesh/edge-infer

EdgeInfer enables efficient edge intelligence by running small AI models,...

24
Experimental
24 ziozzang/Mac_mlx_phi-2_server

Test server code for Phi-2 model. support OpenAI API spec

24
Experimental
25 SemiAnalysisAI/InferenceX-app

Dashboard for InferenceX™, Open Source Continuous Inference

24
Experimental
26 AI-DarwinLabs/vllm-hpc-installer

🚀 Automated installation script for vLLM on HPC systems with ROCm support,...

23
Experimental
27 unisa-hpc/llm.sycl

The sycl version of llm.c (for the final project of HPC course 2024, UNISA)

23
Experimental
28 GusLovesMath/Local_LLM_Training_Apple_Silicon

Created and enhanced a local LLM training system on Apple Silicon with MLX...

23
Experimental
29 ndluna21/nanochat-ascend

Run nanochat training efficiently on Huawei Ascend NPUs with minimal code...

22
Experimental
30 vivekptnk/tinybrain

Swift-native on-device LLM inference with live transformer visualization (X-Ray Mode)

22
Experimental
31 fabriziosalmi/silicondev

Local LLM fine-tuning and chat for Apple Silicon

22
Experimental
32 deeflect/mcclaw

Find which local LLMs actually run on your Mac. 340+ models, hardware-aware...

19
Experimental
33 jeorgexyz/lua-llama

Pure Lua implementation of LLaMA inference - educational project exploring...

19
Experimental
34 arunsanna/tauri-plugin-mlx

Tauri v2 plugin for local LLM inference on Apple Silicon using Apple MLX —...

19
Experimental
35 jballo/VALLM

VALLM (Vision Assisted Large Language Model) is a web application that helps...

19
Experimental
36 koji/llm_api_template

API template for LLM model with llama.cpp

17
Experimental
37 Feyerabend/cc

From Code to Computation: A Modern Guide to Programming and Theory

17
Experimental
38 leszkolukasz/moondream-cpp

Moondream VLLM for C++/Qt

17
Experimental
39 1amageek/swift-lm

Hugging Face native LLM inference on Apple Silicon via direct Metal

16
Experimental
40 fiveoutofnine/whatcanirun

Find the best models and how to run them locally.

16
Experimental
41 adityonugrohoid/vllm-explorer

Probes and catalogs the full vLLM server API — endpoint reference, model...

14
Experimental
42 StefanoChiodino/mlx-manager

Sugar coating on the extremely performant but not very user friendly MLX

14
Experimental
43 GabrielNetoAUT/tps.sh

Benchmark local and cloud large language models on Apple Silicon by...

14
Experimental
44 dev4any1/hyper-stack-4j

Distributed Java-native LLM Inference Engine — commodity CPU/GPU cluster

14
Experimental
45 countzero/windows_manage_large_language_models

PowerShell automation to download large language models (LLMs) from Git...

14
Experimental
46 WilliamK112/llm-fit

Can my laptop run this model? Instant local LLM fit + speed estimator.

14
Experimental
47 amanparuthi8/gpu-llm-india-2026

Should you buy a DGX Spark or rent H100s? Run on Mac Mini or TAALAS cluster?...

14
Experimental
48 GetNyrex/strix-halo-guide

Unlock fast, local LLM inference on AMD-powered mini PCs delivering 65-87...

14
Experimental
49 mspronesti/llm.sycl

llm.c, but in SYCL/Intel oneAPI!

13
Experimental
50 javi22020/batch-router

Batch LLM inference Python library

11
Experimental
51 DunaSpice/JetsonMind

Production-ready AI inference system for NVIDIA Jetson devices with MCP...

11
Experimental
52 echenim/hf-batch-downloader

Automate bulk downloads of Hugging Face LLMs with retry logic, manifest...

11
Experimental
53 vaccovecrana/rwkv.jni

JNI wrapper for rwkv.cpp

11
Experimental
54 tonoy30/Llama

Llama-2 on apple mac using gpu

11
Experimental
55 TheseusInstitute/nix-exllama

Nix derivation for EXLlama

10
Experimental