Distributed Training Frameworks
Frameworks and libraries for distributed training of machine learning models across multiple GPUs, nodes, or devices using data parallelism, model parallelism, or hybrid approaches. Does NOT include single-machine training optimization, inference frameworks, or educational tutorials on distributed concepts without working implementations.
There are 116 distributed training frameworks tracked. 4 score above 70 (verified tier). The highest-rated is deepspeedai/DeepSpeed at 94/100 with 41,801 stars and 1,187,695 monthly downloads. 1 of the top 10 are actively maintained.
Get all 116 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=distributed-training-frameworks&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
deepspeedai/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed... |
|
Verified |
| 2 |
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. |
|
Verified |
| 3 |
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI... |
|
Verified |
| 4 |
bsc-wdc/dislib
The Distributed Computing library for python implemented using PyCOMPSs... |
|
Verified |
| 5 |
google/sedpack
Sedpack - Scalable and efficient data packing |
|
Established |
| 6 |
xorbitsai/xorbits
Scalable Python DS & ML, in an API compatible & lightning fast way. |
|
Established |
| 7 |
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible |
|
Established |
| 8 |
learning-at-home/hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands... |
|
Established |
| 9 |
HazyResearch/fonduer
A knowledge base construction engine for richly formatted data |
|
Established |
| 10 |
kakaobrain/torchgpipe
A GPipe implementation in PyTorch |
|
Established |
| 11 |
cylondata/cylon
Cylon is a fast, scalable, distributed memory, parallel runtime with a... |
|
Established |
| 12 |
spotify/pythonflow
:snake: Dataflow programming for python. |
|
Established |
| 13 |
fastai/fastgpu
A queue service for quickly developing scripts that use all your GPUs efficiently |
|
Established |
| 14 |
NimbleBoxAI/nbox
The official python package for NimbleBox. Exposes all APIs as CLIs and... |
|
Established |
| 15 |
btursunbayev/nvsonar
Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes |
|
Established |
| 16 |
TGSAI/mdio-python
Cloud native, scalable storage engine for various types of energy data. |
|
Established |
| 17 |
BaguaSys/bagua
Bagua Speeds up PyTorch |
|
Established |
| 18 |
PanJinquan/Pytorch-Base-Trainer
Pytorch分布式训练框架 |
|
Established |
| 19 |
Mitchell-Mirano/sorix
Sorix, high performance, easy to learn, fast to code, from prototype to production |
|
Established |
| 20 |
maxpumperla/elephas
Distributed Deep learning with Keras & Spark |
|
Established |
| 21 |
cerndb/dist-keras
Distributed Deep Learning, with a focus on distributed training, using Keras... |
|
Established |
| 22 |
IBM/FfDL
Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning... |
|
Established |
| 23 |
firmai/pandapy
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster... |
|
Established |
| 24 |
h2oai/h2o4gpu
H2Oai GPU Edition |
|
Established |
| 25 |
aksnzhy/xlearn
High performance, easy-to-use, and scalable machine learning (ML) package,... |
|
Emerging |
| 26 |
PaddlePaddle/PaddleCloud
PaddlePaddle Docker images and K8s operators for PaddleOCR/Detection... |
|
Emerging |
| 27 |
Hsword/Hetu
A high-performance distributed deep learning system targeting large-scale... |
|
Emerging |
| 28 |
bytedance/byteps
A high performance and generic framework for distributed DNN training |
|
Emerging |
| 29 |
lynxkite/lynxkite
The complete graph data science platform |
|
Emerging |
| 30 |
Oneflow-Inc/libai
LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training |
|
Emerging |
| 31 |
sehoffmann/dmlcloud
Painless distributed training with torch |
|
Emerging |
| 32 |
nf-core/deepmodeloptim
Stochastic Testing and Input Manipulation for Unbiased Learning Systems |
|
Emerging |
| 33 |
alibaba/EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning... |
|
Emerging |
| 34 |
mars-project/mars
Mars is a tensor-based unified framework for large-scale data computation... |
|
Emerging |
| 35 |
determined-ai/determined
Determined is an open-source machine learning platform that simplifies... |
|
Emerging |
| 36 |
BBEK-Anand/PyTorchLabFlow
To manage PyTorch experiments with ease, analyse all components of training pipeline. |
|
Emerging |
| 37 |
array2d/deepx
Large-scale Auto-Distributed Training/Inference Unified Framework |... |
|
Emerging |
| 38 |
uber/fiber
Distributed Computing for AI Made Simple |
|
Emerging |
| 39 |
saforem2/ezpz
Train across all your devices, ezpz 🍋 |
|
Emerging |
| 40 |
williamFalcon/test-tube
Python library to easily log experiments and parallelize hyperparameter... |
|
Emerging |
| 41 |
unslothai/hyperlearn
2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old. |
|
Emerging |
| 42 |
allenai/tango
Organize your experiments into discrete steps that can be cached and reused... |
|
Emerging |
| 43 |
IntelPython/sdc
Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler |
|
Emerging |
| 44 |
geoffxy/habitat
🔮 Execution time predictions for deep neural network training iterations... |
|
Emerging |
| 45 |
hegongshan/Storage-for-AI-Paper
Accelerating AI Training and Inference from Storage Perspective (Must-read... |
|
Emerging |
| 46 |
rkhan055/SHADE
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training |
|
Emerging |
| 47 |
rentainhe/pytorch-distributed-training
Simple tutorials on Pytorch DDP training |
|
Emerging |
| 48 |
hora-search/horapy
🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library |
|
Emerging |
| 49 |
hkproj/pytorch-transformer-distributed
Distributed training (multi-node) of a Transformer model |
|
Emerging |
| 50 |
alibaba/TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed... |
|
Emerging |
| 51 |
Asthestarsfalll/ExCore
A Modern Configuration/Registry System designed for deeplearning, with some utils. |
|
Emerging |
| 52 |
adalkiran/distributed-inference
A project to demonstrate an approach to designing cross-language and... |
|
Emerging |
| 53 |
flow2ml/Flow2ML
An Open Source Library to make Machine Learning process much Simpler |
|
Emerging |
| 54 |
lsds/Crossbow
Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes |
|
Emerging |
| 55 |
r-xla/stablehlo
Create stableHLO programs in R |
|
Emerging |
| 56 |
gsyang33/Driple
🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems |
|
Emerging |
| 57 |
lucasbrianpiveta/Hetu-DiT
🚀 Optimize your Diffusion Transformers with Hetu-DiT, a dynamic parallel... |
|
Emerging |
| 58 |
Youhe-Jiang/IJCAI2023-OptimalShardedDataParallel
[IJCAI2023] An automated parallel training system that combines the... |
|
Emerging |
| 59 |
ravenprotocol/ravnest
Decentralized Asynchronous Training on Heterogeneous Devices |
|
Emerging |
| 60 |
openclimatefix/ocf_datapipes
OCF's DataPipe based dataloader for training and inference |
|
Emerging |
| 61 |
neelsomani/kv-marketplace
Cross-GPU KV Cache Marketplace |
|
Emerging |
| 62 |
deepfinch/XLearning-GPU
qihoo360 xlearning with GPU support; AI on Hadoop |
|
Emerging |
| 63 |
PLCnext/MLnext-Framework
MLnext Framework is an open source framework for hardware independent... |
|
Emerging |
| 64 |
paypal/gators
Gators is a package to handle model building with big data and fast... |
|
Emerging |
| 65 |
qhliu26/Dive-into-Big-Model-Training
📑 Dive into Big Model Training |
|
Emerging |
| 66 |
AlibabaPAI/FlashModels
Fast and easy distributed model training examples. |
|
Experimental |
| 67 |
eagomez2/moduleprofiler
Free open-source package to profile PyTorch models. |
|
Experimental |
| 68 |
gmasse/gpu-specs
This project aims to centralize detailed specifications for GPUs,... |
|
Experimental |
| 69 |
CEA-LIST/RPCDataloader
A variant of the PyTorch Dataloader using remote workers. |
|
Experimental |
| 70 |
siboehm/ShallowSpeed
Small scale distributed training of sequential deep learning models, built... |
|
Experimental |
| 71 |
NERSC/sc25-dl-tutorial
Deep Learning at Scale @ SC25 |
|
Experimental |
| 72 |
NERSC/dl-at-scale-training
Deep Learning at Scale Training Event at NERSC |
|
Experimental |
| 73 |
yanisZirem/prism-profiler
profiler desktop versions |
|
Experimental |
| 74 |
ANRGUSC/ML_onChain
A python-solidity translator that generates on-chain neural networks |
|
Experimental |
| 75 |
astariul/gibbs
Scale your ML workers asynchronously across processes and machines |
|
Experimental |
| 76 |
earthai-tech/gofast
gofast: AIO machine learning package |
|
Experimental |
| 77 |
AbdelStark/nostrain
Coordinator-free distributed ML training over Nostr relays. |
|
Experimental |
| 78 |
google/iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on... |
|
Experimental |
| 79 |
mrtan-ys/RoleML
Role-oriented programming model for distributed ML |
|
Experimental |
| 80 |
Kushalk0677/Priority-Aware-Adaptive-Scheduling-for-Multi-Model-Edge-AI-Systems
Priority-Aware Edge Scheduler (PAES) for concurrent multi-model AI inference... |
|
Experimental |
| 81 |
NERSC/dl4sci25-dl-at-scale
Deep learning for science school material 2025 |
|
Experimental |
| 82 |
alpha-one-index/ai-infra-index
Comprehensive technical reference for AI hardware: GPUs, TPUs, inference... |
|
Experimental |
| 83 |
Continuum-Intelligence/continuum-hydra
Performance-first ML systems toolkit for environment diagnostics and... |
|
Experimental |
| 84 |
0xNaN/edufsdp
A minimal, educational implementation of Fully Sharded Data Parallel (FSDP). |
|
Experimental |
| 85 |
rasbt/b3-basic-batchsize-benchmark
Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As... |
|
Experimental |
| 86 |
michael-borck/loco-convoy
Documentation and experiments for running AI inference workloads across multiple GPUs |
|
Experimental |
| 87 |
poojakira/Predictive-GPU-Memory-Defragmenter
A production-grade Transformer-driven system that predicts GPU memory... |
|
Experimental |
| 88 |
Szhuaa/PyFlightProfiler
🌟 Boost Python application performance with PyFlightProfiler, a toolbox for... |
|
Experimental |
| 89 |
rogue-agent1/markov-chain-py
Markov chain simulation with stationary distribution |
|
Experimental |
| 90 |
rogue-agent1/yamltoml
Convert between JSON, YAML, and TOML formats. |
|
Experimental |
| 91 |
rogue-agent1/toml2json
toml2json - Convert between TOML and JSON. |
|
Experimental |
| 92 |
zamfir70/TransXform
Training supervisor — live monitoring, early stopping, and checkpoint... |
|
Experimental |
| 93 |
chirasin99/hecate-os
⚙️ Optimize your Linux experience with HecateOS, a performance-driven... |
|
Experimental |
| 94 |
lt-asset/D3
"D3: Differential Testing of Distributed Deep Learning with Model... |
|
Experimental |
| 95 |
cake-lab/DELI
Optimizing loading training data from cloud bucket storage for cloud-based... |
|
Experimental |
| 96 |
marcos-venicius/smlf
A small machine learning framework with ONLY python and math |
|
Experimental |
| 97 |
alvarobartt/ml-monitoring-with-wandb
:detective::robot: Monitoring a PyTorch Lightning CNN with Weights & Biases |
|
Experimental |
| 98 |
Arakiss/hecate-os
Linux distro with automatic hardware detection and per-system optimization.... |
|
Experimental |
| 99 |
JeffWigger/FastDynamicBatcher
FastDynamicBatcher is a library for batching inputs across requests to... |
|
Experimental |
| 100 |
ashishpatel26/Rapidsai_Machine_learning_on_GPU
Rapidsai_Machine_learnring_on_GPU |
|
Experimental |
| 101 |
Dev-next-gen/Bittensor-rocm
ROCm-compatible fork of Bittensor – Full PyTorch 2.4 ROCm support – Wallet,... |
|
Experimental |
| 102 |
gdf-ai/gdf
Open-source community GPU network for distributed AI model training |
|
Experimental |
| 103 |
PatrickPontes44/tiny-panda
tiny-panda is a lightweight JavaScript library inspired by Python’s pandas.... |
|
Experimental |
| 104 |
Kritim708/multi-gpu-deep-learning-nvidia-workshop
This repository contains a project I created as part of the NVIDIA workshop... |
|
Experimental |
| 105 |
Prelf1992/distributed-ml-training-system
A proof-of-concept for a distributed machine learning training system,... |
|
Experimental |
| 106 |
Gaius-del/python_hpc_2025
🚀 Accelerate scientific applications in supercomputing with Python using... |
|
Experimental |
| 107 |
Pects1949/Python-Distributed-ML-Framework
A Python framework for distributed machine learning training, leveraging... |
|
Experimental |
| 108 |
shivangraval50/distributed-ml-training
Distributed ML training platform achieving 10.6× speedup | PyTorch DDP |... |
|
Experimental |
| 109 |
JagjeevanAK/CruxML
(Under-Development) A minified Machine Learning and Deep learning Framework/Library. |
|
Experimental |
| 110 |
DaveAldon/Distributed-ML-with-MLX
🍎👉🍏 Everything you need in order to get started building distributed machine... |
|
Experimental |
| 111 |
dlzou/computron
Serving distributed deep learning models with model parallel swapping. |
|
Experimental |
| 112 |
explcre/pipeDejavu
pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for... |
|
Experimental |
| 113 |
ArslanKamchybekov/raydar
Raydar is the smart lost and found platform designed specifically for UIC... |
|
Experimental |
| 114 |
Jason-Wang313/OmniTrace
A full-stack GPU profiling and simulation framework that bridges high-level... |
|
Experimental |
| 115 |
GUT-AI/memory-bottleneck
Memory Bottleneck of Deep Learning models |
|
Experimental |
| 116 |
explcre/SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf
SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU,... |
|
Experimental |