omlx and asiai

The LLM inference server is a complement to the multi-engine LLM benchmark and monitoring CLI, as the server provides continuous batching and SSD caching for inference that can then be benchmarked and monitored by the CLI tool.

omlx
65
Established
asiai
33
Emerging
Maintenance 25/25
Adoption 10/25
Maturity 11/25
Community 19/25
Maintenance 13/25
Adoption 2/25
Maturity 18/25
Community 0/25
Stars: 4,057
Forks: 306
Downloads:
Commits (30d): 539
Language: Python
License: Apache-2.0
Stars: 2
Forks:
Downloads:
Commits (30d): 0
Language: Python
License: Apache-2.0
No Package No Dependents
No Dependents

About omlx

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

Supports multi-model serving with automatic LRU eviction and manual pinning, alongside vision-language models and embedding/reranker inference—all via OpenAI-compatible API endpoints. KV cache persists across hot (RAM) and cold (SSD) tiers using block-based management with prefix sharing, restoring cached context from disk on subsequent requests even after server restarts. Includes built-in web dashboard for real-time monitoring, per-model configuration (sampling, TTL, aliases), and direct chat interface, with MCP (Model Context Protocol) support for tool integration.

About asiai

druide67/asiai

Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

Related comparisons

Scores updated daily from GitHub, PyPI, and npm data. How scores work