llm-db/MLKV

MLKV: Efficiently Scaling up Large Embedding Model Training with Disk-based Key-Value Storage (ICDE 2025 Industry Track)

/ 100

Emerging

MLKV+ helps machine learning engineers accelerate the training of large embedding models. It efficiently manages and accesses massive amounts of data by storing frequently used embeddings in GPU memory and less critical data on disk, using NVIDIA GPUs. This allows for faster training by reducing data transfer bottlenecks.

Use this if you are a machine learning engineer or researcher training large-scale embedding models and frequently encounter GPU memory limitations or slow data access from disk.

Not ideal if you are working with smaller models that fit entirely within GPU memory or if your primary bottleneck isn't data loading for embeddings.

Machine-Learning-Engineering Deep-Learning-Training Embedding-Models GPU-Optimization Big-Data-ML

No Package No Dependents

Maintenance 6 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

—

License

MIT

Higher-rated alternatives

Accenture/AmpliGraph

Python library for Representation Learning on Knowledge Graphs https://docs.ampligraph.org

benedekrozemberczki/graph2vec

A parallel implementation of "graph2vec: Learning Distributed Representations of Graphs"...

DeepGraphLearning/graphvite

GraphVite: A General and High-performance Graph Embedding System

bi-graph/Emgraph

A Python library for knowledge graph representation learning (graph embedding).

nju-websoft/muKG

μKG: A Library for Multi-source Knowledge Graph Embeddings and Applications, ISWC 2022

Explore Embedding Tools

All categories Trending Embeddings directory Insights