messkan/prompt-cache

Cut LLM costs by up to 80% and unlock sub-millisecond responses with intelligent semantic caching.A drop-in, provider-agnostic LLM proxy written in Go with sub-millisecond response

47
/ 100
Emerging

Implements a two-stage semantic similarity verification system combining high/low thresholds for direct cache hits and misses, with optional LLM-based intent verification for ambiguous cases to prevent hallucination risks. Integrates with multiple embedding providers (OpenAI, Mistral, Claude/Voyage) via a plugin architecture, exposing an OpenAI-compatible API endpoint (`/v1`) for transparent middleware deployment. Includes production monitoring with Prometheus metrics, structured logging, cache management API, and graceful shutdown for Kubernetes/containerized environments.

209 stars.

No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 13 / 25
Community 14 / 25

How are scores calculated?

Stars

209

Forks

19

Language

Go

License

MIT

Last pushed

Jan 25, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/messkan/prompt-cache"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.