jjang-ai/vmlx

vMLX - Cont Batch, Prefix, Paged, KV Cache Quant, VL - Powers MLX Studio. Image gen/edit, OpenAI/Anth

/ 100

Emerging

Implements adaptive mixed-precision quantization (JANG 2-bit) with a 5-layer cache hierarchy combining memory-aware prefix caching, paged KV blocks, and persistent disk storage with LRU eviction. Built on MLX's Metal GPU acceleration for Apple Silicon, it exposes OpenAI/Anthropic-compatible REST APIs while supporting tool calling, reasoning modes, and multimodal inference across LLMs, VLMs, MoE models, and diffusion-based image generation.

Available on PyPI.

Maintenance 13 / 25

Adoption 6 / 25

Maturity 18 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Featured in

Agent Memory in 2026: What Actually Works for Persistent AI

Higher-rated alternatives

topoteretes/cognee

Knowledge Engine for AI Agent Memory in 6 lines of code

divagr18/memlayer

Plug-and-play memory for LLMs in 3 lines of code. Add persistent, intelligent, human-like memory...

verygoodplugins/automem

AutoMem is a graph-vector memory service that gives AI assistants durable, relational memory:

CortexReach/memory-lancedb-pro

Enhanced LanceDB memory plugin for OpenClaw — Hybrid Retrieval (Vector + BM25), Cross-Encoder...

CaviraOSS/OpenMemory

Local persistent memory store for LLM applications including claude desktop, github copilot,...

Explore Vector Databases

All categories Trending Vector Database directory Insights