jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

65
/ 100
Established

Supports multi-model serving with automatic LRU eviction and manual pinning, alongside vision-language models and embedding/reranker inference—all via OpenAI-compatible API endpoints. KV cache persists across hot (RAM) and cold (SSD) tiers using block-based management with prefix sharing, restoring cached context from disk on subsequent requests even after server restarts. Includes built-in web dashboard for real-time monitoring, per-model configuration (sampling, TTL, aliases), and direct chat interface, with MCP (Model Context Protocol) support for tool integration.

4,057 stars. Actively maintained with 539 commits in the last 30 days.

No Package No Dependents
Maintenance 25 / 25
Adoption 10 / 25
Maturity 11 / 25
Community 19 / 25

How are scores calculated?

Stars

4,057

Forks

306

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Commits (30d)

539

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/jundot/omlx"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.