Distributed Training Frameworks

Frameworks and libraries for distributed training of machine learning models across multiple GPUs, nodes, or devices using data parallelism, model parallelism, or hybrid approaches. Does NOT include single-machine training optimization, inference frameworks, or educational tutorials on distributed concepts without working implementations.

There are 116 distributed training frameworks tracked. 4 score above 70 (verified tier). The highest-rated is deepspeedai/DeepSpeed at 94/100 with 41,801 stars and 1,187,695 monthly downloads. 1 of the top 10 are actively maintained.

Get all 116 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=distributed-training-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed...

94
Verified
2 horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

77
Verified
3 helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI...

77
Verified
4 bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs...

71
Verified
5 google/sedpack

Sedpack - Scalable and efficient data packing

69
Established
6 xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

69
Established
7 hpcaitech/ColossalAI

Making large AI models cheaper, faster and more accessible

68
Established
8 learning-at-home/hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands...

66
Established
9 HazyResearch/fonduer

A knowledge base construction engine for richly formatted data

63
Established
10 kakaobrain/torchgpipe

A GPipe implementation in PyTorch

61
Established
11 cylondata/cylon

Cylon is a fast, scalable, distributed memory, parallel runtime with a...

59
Established
12 spotify/pythonflow

:snake: Dataflow programming for python.

59
Established
13 fastai/fastgpu

A queue service for quickly developing scripts that use all your GPUs efficiently

58
Established
14 NimbleBoxAI/nbox

The official python package for NimbleBox. Exposes all APIs as CLIs and...

57
Established
15 btursunbayev/nvsonar

Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes

55
Established
16 TGSAI/mdio-python

Cloud native, scalable storage engine for various types of energy data.

54
Established
17 BaguaSys/bagua

Bagua Speeds up PyTorch

53
Established
18 PanJinquan/Pytorch-Base-Trainer

Pytorch分布式训练框架

52
Established
19 Mitchell-Mirano/sorix

Sorix, high performance, easy to learn, fast to code, from prototype to production

51
Established
20 maxpumperla/elephas

Distributed Deep learning with Keras & Spark

51
Established
21 cerndb/dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras...

51
Established
22 IBM/FfDL

Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning...

51
Established
23 firmai/pandapy

PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster...

50
Established
24 h2oai/h2o4gpu

H2Oai GPU Edition

50
Established
25 aksnzhy/xlearn

High performance, easy-to-use, and scalable machine learning (ML) package,...

49
Emerging
26 PaddlePaddle/PaddleCloud

PaddlePaddle Docker images and K8s operators for PaddleOCR/Detection...

49
Emerging
27 Hsword/Hetu

A high-performance distributed deep learning system targeting large-scale...

48
Emerging
28 bytedance/byteps

A high performance and generic framework for distributed DNN training

48
Emerging
29 lynxkite/lynxkite

The complete graph data science platform

47
Emerging
30 Oneflow-Inc/libai

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

47
Emerging
31 sehoffmann/dmlcloud

Painless distributed training with torch

47
Emerging
32 nf-core/deepmodeloptim

Stochastic Testing and Input Manipulation for Unbiased Learning Systems

47
Emerging
33 alibaba/EasyParallelLibrary

Easy Parallel Library (EPL) is a general and efficient deep learning...

47
Emerging
34 mars-project/mars

Mars is a tensor-based unified framework for large-scale data computation...

47
Emerging
35 determined-ai/determined

Determined is an open-source machine learning platform that simplifies...

47
Emerging
36 BBEK-Anand/PyTorchLabFlow

To manage PyTorch experiments with ease, analyse all components of training pipeline.

46
Emerging
37 array2d/deepx

Large-scale Auto-Distributed Training/Inference Unified Framework |...

45
Emerging
38 uber/fiber

Distributed Computing for AI Made Simple

45
Emerging
39 saforem2/ezpz

Train across all your devices, ezpz 🍋

45
Emerging
40 williamFalcon/test-tube

Python library to easily log experiments and parallelize hyperparameter...

44
Emerging
41 unslothai/hyperlearn

2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

44
Emerging
42 allenai/tango

Organize your experiments into discrete steps that can be cached and reused...

43
Emerging
43 IntelPython/sdc

Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler

43
Emerging
44 geoffxy/habitat

🔮 Execution time predictions for deep neural network training iterations...

42
Emerging
45 hegongshan/Storage-for-AI-Paper

Accelerating AI Training and Inference from Storage Perspective (Must-read...

40
Emerging
46 rkhan055/SHADE

SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

40
Emerging
47 rentainhe/pytorch-distributed-training

Simple tutorials on Pytorch DDP training

39
Emerging
48 hora-search/horapy

🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library

38
Emerging
49 hkproj/pytorch-transformer-distributed

Distributed training (multi-node) of a Transformer model

38
Emerging
50 alibaba/TePDist

TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed...

37
Emerging
51 Asthestarsfalll/ExCore

A Modern Configuration/Registry System designed for deeplearning, with some utils.

37
Emerging
52 adalkiran/distributed-inference

A project to demonstrate an approach to designing cross-language and...

37
Emerging
53 flow2ml/Flow2ML

An Open Source Library to make Machine Learning process much Simpler

36
Emerging
54 lsds/Crossbow

Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes

36
Emerging
55 r-xla/stablehlo

Create stableHLO programs in R

35
Emerging
56 gsyang33/Driple

🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems

35
Emerging
57 lucasbrianpiveta/Hetu-DiT

🚀 Optimize your Diffusion Transformers with Hetu-DiT, a dynamic parallel...

35
Emerging
58 Youhe-Jiang/IJCAI2023-OptimalShardedDataParallel

[IJCAI2023] An automated parallel training system that combines the...

34
Emerging
59 ravenprotocol/ravnest

Decentralized Asynchronous Training on Heterogeneous Devices

33
Emerging
60 openclimatefix/ocf_datapipes

OCF's DataPipe based dataloader for training and inference

32
Emerging
61 neelsomani/kv-marketplace

Cross-GPU KV Cache Marketplace

32
Emerging
62 deepfinch/XLearning-GPU

qihoo360 xlearning with GPU support; AI on Hadoop

32
Emerging
63 PLCnext/MLnext-Framework

MLnext Framework is an open source framework for hardware independent...

31
Emerging
64 paypal/gators

Gators is a package to handle model building with big data and fast...

30
Emerging
65 qhliu26/Dive-into-Big-Model-Training

📑 Dive into Big Model Training

30
Emerging
66 AlibabaPAI/FlashModels

Fast and easy distributed model training examples.

29
Experimental
67 eagomez2/moduleprofiler

Free open-source package to profile PyTorch models.

29
Experimental
68 gmasse/gpu-specs

This project aims to centralize detailed specifications for GPUs,...

29
Experimental
69 CEA-LIST/RPCDataloader

A variant of the PyTorch Dataloader using remote workers.

28
Experimental
70 siboehm/ShallowSpeed

Small scale distributed training of sequential deep learning models, built...

28
Experimental
71 NERSC/sc25-dl-tutorial

Deep Learning at Scale @ SC25

28
Experimental
72 NERSC/dl-at-scale-training

Deep Learning at Scale Training Event at NERSC

27
Experimental
73 yanisZirem/prism-profiler

profiler desktop versions

26
Experimental
74 ANRGUSC/ML_onChain

A python-solidity translator that generates on-chain neural networks

26
Experimental
75 astariul/gibbs

Scale your ML workers asynchronously across processes and machines

26
Experimental
76 earthai-tech/gofast

gofast: AIO machine learning package

26
Experimental
77 AbdelStark/nostrain

Coordinator-free distributed ML training over Nostr relays.

25
Experimental
78 google/iopddl

Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on...

25
Experimental
79 mrtan-ys/RoleML

Role-oriented programming model for distributed ML

25
Experimental
80 Kushalk0677/Priority-Aware-Adaptive-Scheduling-for-Multi-Model-Edge-AI-Systems

Priority-Aware Edge Scheduler (PAES) for concurrent multi-model AI inference...

24
Experimental
81 NERSC/dl4sci25-dl-at-scale

Deep learning for science school material 2025

24
Experimental
82 alpha-one-index/ai-infra-index

Comprehensive technical reference for AI hardware: GPUs, TPUs, inference...

23
Experimental
83 Continuum-Intelligence/continuum-hydra

Performance-first ML systems toolkit for environment diagnostics and...

23
Experimental
84 0xNaN/edufsdp

A minimal, educational implementation of Fully Sharded Data Parallel (FSDP).

22
Experimental
85 rasbt/b3-basic-batchsize-benchmark

Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As...

22
Experimental
86 michael-borck/loco-convoy

Documentation and experiments for running AI inference workloads across multiple GPUs

22
Experimental
87 poojakira/Predictive-GPU-Memory-Defragmenter

A production-grade Transformer-driven system that predicts GPU memory...

22
Experimental
88 Szhuaa/PyFlightProfiler

🌟 Boost Python application performance with PyFlightProfiler, a toolbox for...

22
Experimental
89 rogue-agent1/markov-chain-py

Markov chain simulation with stationary distribution

22
Experimental
90 rogue-agent1/yamltoml

Convert between JSON, YAML, and TOML formats.

22
Experimental
91 rogue-agent1/toml2json

toml2json - Convert between TOML and JSON.

22
Experimental
92 zamfir70/TransXform

Training supervisor — live monitoring, early stopping, and checkpoint...

22
Experimental
93 chirasin99/hecate-os

⚙️ Optimize your Linux experience with HecateOS, a performance-driven...

22
Experimental
94 lt-asset/D3

"D3: Differential Testing of Distributed Deep Learning with Model...

22
Experimental
95 cake-lab/DELI

Optimizing loading training data from cloud bucket storage for cloud-based...

21
Experimental
96 marcos-venicius/smlf

A small machine learning framework with ONLY python and math

20
Experimental
97 alvarobartt/ml-monitoring-with-wandb

:detective::robot: Monitoring a PyTorch Lightning CNN with Weights & Biases

20
Experimental
98 Arakiss/hecate-os

Linux distro with automatic hardware detection and per-system optimization....

20
Experimental
99 JeffWigger/FastDynamicBatcher

FastDynamicBatcher is a library for batching inputs across requests to...

19
Experimental
100 ashishpatel26/Rapidsai_Machine_learning_on_GPU

Rapidsai_Machine_learnring_on_GPU

19
Experimental
101 Dev-next-gen/Bittensor-rocm

ROCm-compatible fork of Bittensor – Full PyTorch 2.4 ROCm support – Wallet,...

15
Experimental
102 gdf-ai/gdf

Open-source community GPU network for distributed AI model training

15
Experimental
103 PatrickPontes44/tiny-panda

tiny-panda is a lightweight JavaScript library inspired by Python’s pandas....

15
Experimental
104 Kritim708/multi-gpu-deep-learning-nvidia-workshop

This repository contains a project I created as part of the NVIDIA workshop...

15
Experimental
105 Prelf1992/distributed-ml-training-system

A proof-of-concept for a distributed machine learning training system,...

14
Experimental
106 Gaius-del/python_hpc_2025

🚀 Accelerate scientific applications in supercomputing with Python using...

14
Experimental
107 Pects1949/Python-Distributed-ML-Framework

A Python framework for distributed machine learning training, leveraging...

14
Experimental
108 shivangraval50/distributed-ml-training

Distributed ML training platform achieving 10.6× speedup | PyTorch DDP |...

14
Experimental
109 JagjeevanAK/CruxML

(Under-Development) A minified Machine Learning and Deep learning Framework/Library.

14
Experimental
110 DaveAldon/Distributed-ML-with-MLX

🍎👉🍏 Everything you need in order to get started building distributed machine...

14
Experimental
111 dlzou/computron

Serving distributed deep learning models with model parallel swapping.

13
Experimental
112 explcre/pipeDejavu

pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for...

12
Experimental
113 ArslanKamchybekov/raydar

Raydar is the smart lost and found platform designed specifically for UIC...

11
Experimental
114 Jason-Wang313/OmniTrace

A full-stack GPU profiling and simulation framework that bridges high-level...

11
Experimental
115 GUT-AI/memory-bottleneck

Memory Bottleneck of Deep Learning models

10
Experimental
116 explcre/SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf

SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU,...

10
Experimental