jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

/ 100

Established

Supports multi-model serving with automatic LRU eviction and manual pinning, alongside vision-language models and embedding/reranker inference—all via OpenAI-compatible API endpoints. KV cache persists across hot (RAM) and cold (SSD) tiers using block-based management with prefix sharing, restoring cached context from disk on subsequent requests even after server restarts. Includes built-in web dashboard for real-time monitoring, per-model configuration (sampling, TTL, aliases), and direct chat interface, with MCP (Model Context Protocol) support for tool integration.

4,057 stars. Actively maintained with 539 commits in the last 30 days.

No Package No Dependents

Maintenance 25 / 25

Adoption 10 / 25

Maturity 11 / 25

Community 19 / 25

How are scores calculated?

Stars

4,057

Forks

306

Language

Python

License

Apache-2.0

Featured in

Your Newest Competitor Isn't a Developer

Compare

omlx and vllm-mlx omlx and asiai

Related tools

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models...

jordanhubbard/nanolang

A tiny experimental language designed to be targeted by coding LLMs

josStorer/RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides an interface...

akivasolutions/tightwad

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...

petrukha-ivan/mlx-swift-structured

Structured output generation in Swift

Explore LLM Tools

All categories Trending LLM Tool directory Insights