jjang-ai/vmlx
vMLX - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth
Implements adaptive mixed-precision quantization (JANG 2-bit) with a 5-layer cache hierarchy combining memory-aware prefix caching, paged KV blocks, and persistent disk storage with LRU eviction. Built on MLX's Metal GPU acceleration for Apple Silicon, it exposes OpenAI/Anthropic-compatible REST APIs while supporting tool calling, reasoning modes, and multimodal inference across LLMs, VLMs, MoE models, and diffusion-based image generation.
Available on PyPI.
Stars
15
Forks
1
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 18, 2026
Commits (30d)
0
Dependencies
19
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/jjang-ai/vmlx"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
topoteretes/cognee
Knowledge Engine for AI Agent Memory in 6 lines of code
divagr18/memlayer
Plug-and-play memory for LLMs in 3 lines of code. Add persistent, intelligent, human-like memory...
verygoodplugins/automem
AutoMem is a graph-vector memory service that gives AI assistants durable, relational memory:
CortexReach/memory-lancedb-pro
Enhanced LanceDB memory plugin for OpenClaw — Hybrid Retrieval (Vector + BM25), Cross-Encoder...
CaviraOSS/OpenMemory
Local persistent memory store for LLM applications including claude desktop, github copilot,...