llm-db/MLKV

MLKV: Efficiently Scaling up Large Embedding Model Training with Disk-based Key-Value Storage (ICDE 2025 Industry Track)

34
/ 100
Emerging

MLKV+ helps machine learning engineers accelerate the training of large embedding models. It efficiently manages and accesses massive amounts of data by storing frequently used embeddings in GPU memory and less critical data on disk, using NVIDIA GPUs. This allows for faster training by reducing data transfer bottlenecks.

Use this if you are a machine learning engineer or researcher training large-scale embedding models and frequently encounter GPU memory limitations or slow data access from disk.

Not ideal if you are working with smaller models that fit entirely within GPU memory or if your primary bottleneck isn't data loading for embeddings.

Machine-Learning-Engineering Deep-Learning-Training Embedding-Models GPU-Optimization Big-Data-ML
No Package No Dependents
Maintenance 6 / 25
Adoption 4 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

8

Forks

1

Language

License

MIT

Last pushed

Dec 15, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/llm-db/MLKV"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.