llm-db/MLKV
MLKV: Efficiently Scaling up Large Embedding Model Training with Disk-based Key-Value Storage (ICDE 2025 Industry Track)
MLKV+ helps machine learning engineers accelerate the training of large embedding models. It efficiently manages and accesses massive amounts of data by storing frequently used embeddings in GPU memory and less critical data on disk, using NVIDIA GPUs. This allows for faster training by reducing data transfer bottlenecks.
Use this if you are a machine learning engineer or researcher training large-scale embedding models and frequently encounter GPU memory limitations or slow data access from disk.
Not ideal if you are working with smaller models that fit entirely within GPU memory or if your primary bottleneck isn't data loading for embeddings.
Stars
8
Forks
1
Language
—
License
MIT
Category
Last pushed
Dec 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/llm-db/MLKV"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Accenture/AmpliGraph
Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org
benedekrozemberczki/graph2vec
A parallel implementation of "graph2vec: Learning Distributed Representations of Graphs"...
DeepGraphLearning/graphvite
GraphVite: A General and High-performance Graph Embedding System
bi-graph/Emgraph
A Python library for knowledge graph representation learning (graph embedding).
nju-websoft/muKG
μKG: A Library for Multi-source Knowledge Graph Embeddings and Applications, ISWC 2022