Distributed Training Frameworks

Frameworks and libraries for distributed training of machine learning models across multiple GPUs, nodes, or devices using data parallelism, model parallelism, or hybrid approaches. Does NOT include single-machine training optimization, inference frameworks, or educational tutorials on distributed concepts without working implementations.

There are 116 distributed training frameworks tracked. 4 score above 70 (verified tier). The highest-rated is deepspeedai/DeepSpeed at 94/100 with 41,801 stars and 1,187,695 monthly downloads. 1 of the top 10 are actively maintained.

Get all 116 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=distributed-training-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	deepspeedai/DeepSpeed DeepSpeed is a deep learning optimization library that makes distributed...	94	Verified	41,801	Python
2	horovod/horovod Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.	77	Verified	14,678	Python
3	helmholtz-analytics/heat Distributed tensors and Machine Learning framework with GPU and MPI...	77	Verified	231	Python
4	bsc-wdc/dislib The Distributed Computing library for python implemented using PyCOMPSs...	71	Verified	52	Python
5	google/sedpack Sedpack - Scalable and efficient data packing	69	Established	34	Python
6	xorbitsai/xorbits Scalable Python DS & ML, in an API compatible & lightning fast way.	69	Established	1,203	Python
7	hpcaitech/ColossalAI Making large AI models cheaper, faster and more accessible	68	Established	41,362	Python
8	learning-at-home/hivemind Decentralized deep learning in PyTorch. Built to train models on thousands...	66	Established	2,397	Python
9	HazyResearch/fonduer A knowledge base construction engine for richly formatted data	63	Established	412	Python
10	kakaobrain/torchgpipe A GPipe implementation in PyTorch	61	Established	862	Python
11	cylondata/cylon Cylon is a fast, scalable, distributed memory, parallel runtime with a...	59	Established	302	Jupyter Notebook
12	spotify/pythonflow :snake: Dataflow programming for python.	59	Established	292	Python
13	fastai/fastgpu A queue service for quickly developing scripts that use all your GPUs efficiently	58	Established	88	Jupyter Notebook
14	NimbleBoxAI/nbox The official python package for NimbleBox. Exposes all APIs as CLIs and...	57	Established	87	Python
15	btursunbayev/nvsonar Active GPU diagnostic tool that identifies performance bottlenecks using micro-probes	55	Established	4	Python
16	TGSAI/mdio-python Cloud native, scalable storage engine for various types of energy data.	54	Established	39	Python
17	BaguaSys/bagua Bagua Speeds up PyTorch	53	Established	884	Python
18	PanJinquan/Pytorch-Base-Trainer Pytorch分布式训练框架	52	Established	84	Python
19	Mitchell-Mirano/sorix Sorix, high performance, easy to learn, fast to code, from prototype to production	51	Established	18	Python
20	maxpumperla/elephas Distributed Deep learning with Keras & Spark	51	Established	1,578	Python
21	cerndb/dist-keras Distributed Deep Learning, with a focus on distributed training, using Keras...	51	Established	623	Python
22	IBM/FfDL Fabric for Deep Learning (FfDL, pronounced fiddle) is a Deep Learning...	51	Established	688	Go
23	firmai/pandapy PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster...	50	Established	549	Python
24	h2oai/h2o4gpu H2Oai GPU Edition	50	Established	466	C++
25	aksnzhy/xlearn High performance, easy-to-use, and scalable machine learning (ML) package,...	49	Emerging	3,097	C++
26	PaddlePaddle/PaddleCloud PaddlePaddle Docker images and K8s operators for PaddleOCR/Detection...	49	Emerging	302	Go
27	Hsword/Hetu A high-performance distributed deep learning system targeting large-scale...	48	Emerging	124	Python
28	bytedance/byteps A high performance and generic framework for distributed DNN training	48	Emerging	3,718	Python
29	lynxkite/lynxkite The complete graph data science platform	47	Emerging	142	Scala
30	Oneflow-Inc/libai LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training	47	Emerging	406	Python
31	sehoffmann/dmlcloud Painless distributed training with torch	47	Emerging	12	Python
32	nf-core/deepmodeloptim Stochastic Testing and Input Manipulation for Unbiased Learning Systems	47	Emerging	30	Nextflow
33	alibaba/EasyParallelLibrary Easy Parallel Library (EPL) is a general and efficient deep learning...	47	Emerging	271	Python
34	mars-project/mars Mars is a tensor-based unified framework for large-scale data computation...	47	Emerging	2,748	Python
35	determined-ai/determined Determined is an open-source machine learning platform that simplifies...	47	Emerging	3,214	Go
36	BBEK-Anand/PyTorchLabFlow To manage PyTorch experiments with ease, analyse all components of training pipeline.	46	Emerging	7	Python
37	array2d/deepx Large-scale Auto-Distributed Training/Inference Unified Framework \|...	45	Emerging	55	C++
38	uber/fiber Distributed Computing for AI Made Simple	45	Emerging	1,047	Python
39	saforem2/ezpz Train across all your devices, ezpz 🍋	45	Emerging	26	Python
40	williamFalcon/test-tube Python library to easily log experiments and parallelize hyperparameter...	44	Emerging	735	JavaScript
41	unslothai/hyperlearn 2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.	44	Emerging	2,406	Jupyter Notebook
42	allenai/tango Organize your experiments into discrete steps that can be cached and reused...	43	Emerging	568	Python
43	IntelPython/sdc Numba extension for compiling Pandas data frames, Intel® Scalable Dataframe Compiler	43	Emerging	643	Python
44	geoffxy/habitat 🔮 Execution time predictions for deep neural network training iterations...	42	Emerging	63	Python
45	hegongshan/Storage-for-AI-Paper Accelerating AI Training and Inference from Storage Perspective (Must-read...	40	Emerging	58	—
46	rkhan055/SHADE SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training	40	Emerging	36	Python
47	rentainhe/pytorch-distributed-training Simple tutorials on Pytorch DDP training	39	Emerging	286	Python
48	hora-search/horapy 🐍 Python bidding for the Hora Approximate Nearest Neighbor Search Algorithm library	38	Emerging	73	Python
49	hkproj/pytorch-transformer-distributed Distributed training (multi-node) of a Transformer model	38	Emerging	94	Python
50	alibaba/TePDist TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed...	37	Emerging	98	C++
51	Asthestarsfalll/ExCore A Modern Configuration/Registry System designed for deeplearning, with some utils.	37	Emerging	18	Python
52	adalkiran/distributed-inference A project to demonstrate an approach to designing cross-language and...	37	Emerging	86	Go
53	flow2ml/Flow2ML An Open Source Library to make Machine Learning process much Simpler	36	Emerging	25	Python
54	lsds/Crossbow Crossbow: A Multi-GPU Deep Learning System for Training with Small Batch Sizes	36	Emerging	56	Java
55	r-xla/stablehlo Create stableHLO programs in R	35	Emerging	7	R
56	gsyang33/Driple 🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems	35	Emerging	32	Python
57	lucasbrianpiveta/Hetu-DiT 🚀 Optimize your Diffusion Transformers with Hetu-DiT, a dynamic parallel...	35	Emerging	1	Python
58	Youhe-Jiang/IJCAI2023-OptimalShardedDataParallel [IJCAI2023] An automated parallel training system that combines the...	34	Emerging	52	Python
59	ravenprotocol/ravnest Decentralized Asynchronous Training on Heterogeneous Devices	33	Emerging	10	Python
60	openclimatefix/ocf_datapipes OCF's DataPipe based dataloader for training and inference	32	Emerging	14	Python
61	neelsomani/kv-marketplace Cross-GPU KV Cache Marketplace	32	Emerging	22	Python
62	deepfinch/XLearning-GPU qihoo360 xlearning with GPU support; AI on Hadoop	32	Emerging	24	Java
63	PLCnext/MLnext-Framework MLnext Framework is an open source framework for hardware independent...	31	Emerging	12	Python
64	paypal/gators Gators is a package to handle model building with big data and fast...	30	Emerging	9	HTML
65	qhliu26/Dive-into-Big-Model-Training 📑 Dive into Big Model Training	30	Emerging	115	—
66	AlibabaPAI/FlashModels Fast and easy distributed model training examples.	29	Experimental	12	Python
67	eagomez2/moduleprofiler Free open-source package to profile PyTorch models.	29	Experimental	10	Python
68	gmasse/gpu-specs This project aims to centralize detailed specifications for GPUs,...	29	Experimental	7	Python
69	CEA-LIST/RPCDataloader A variant of the PyTorch Dataloader using remote workers.	28	Experimental	21	Python
70	siboehm/ShallowSpeed Small scale distributed training of sequential deep learning models, built...	28	Experimental	163	Python
71	NERSC/sc25-dl-tutorial Deep Learning at Scale @ SC25	28	Experimental	13	Python
72	NERSC/dl-at-scale-training Deep Learning at Scale Training Event at NERSC	27	Experimental	23	Python
73	yanisZirem/prism-profiler profiler desktop versions	26	Experimental	11	Python
74	ANRGUSC/ML_onChain A python-solidity translator that generates on-chain neural networks	26	Experimental	8	TypeScript
75	astariul/gibbs Scale your ML workers asynchronously across processes and machines	26	Experimental	13	Python
76	earthai-tech/gofast gofast: AIO machine learning package	26	Experimental	2	Python
77	AbdelStark/nostrain Coordinator-free distributed ML training over Nostr relays.	25	Experimental	4	Python
78	google/iopddl Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on...	25	Experimental	25	C++
79	mrtan-ys/RoleML Role-oriented programming model for distributed ML	25	Experimental	—	Python
80	Kushalk0677/Priority-Aware-Adaptive-Scheduling-for-Multi-Model-Edge-AI-Systems Priority-Aware Edge Scheduler (PAES) for concurrent multi-model AI inference...	24	Experimental	2	Python
81	NERSC/dl4sci25-dl-at-scale Deep learning for science school material 2025	24	Experimental	19	Python
82	alpha-one-index/ai-infra-index Comprehensive technical reference for AI hardware: GPUs, TPUs, inference...	23	Experimental	1	HTML
83	Continuum-Intelligence/continuum-hydra Performance-first ML systems toolkit for environment diagnostics and...	23	Experimental	1	Python
84	0xNaN/edufsdp A minimal, educational implementation of Fully Sharded Data Parallel (FSDP).	22	Experimental	12	Python
85	rasbt/b3-basic-batchsize-benchmark Experiments for the blog post "No, We Don't Have to Choose Batch Sizes As...	22	Experimental	20	Python
86	michael-borck/loco-convoy Documentation and experiments for running AI inference workloads across multiple GPUs	22	Experimental	—	JavaScript
87	poojakira/Predictive-GPU-Memory-Defragmenter A production-grade Transformer-driven system that predicts GPU memory...	22	Experimental	—	Python
88	Szhuaa/PyFlightProfiler 🌟 Boost Python application performance with PyFlightProfiler, a toolbox for...	22	Experimental	—	Python
89	rogue-agent1/markov-chain-py Markov chain simulation with stationary distribution	22	Experimental	—	Python
90	rogue-agent1/yamltoml Convert between JSON, YAML, and TOML formats.	22	Experimental	—	Python
91	rogue-agent1/toml2json toml2json - Convert between TOML and JSON.	22	Experimental	—	Python
92	zamfir70/TransXform Training supervisor — live monitoring, early stopping, and checkpoint...	22	Experimental	—	Rust
93	chirasin99/hecate-os ⚙️ Optimize your Linux experience with HecateOS, a performance-driven...	22	Experimental	—	Rust
94	lt-asset/D3 "D3: Differential Testing of Distributed Deep Learning with Model...	22	Experimental	7	Python
95	cake-lab/DELI Optimizing loading training data from cloud bucket storage for cloud-based...	21	Experimental	11	Jupyter Notebook
96	marcos-venicius/smlf A small machine learning framework with ONLY python and math	20	Experimental	12	Python
97	alvarobartt/ml-monitoring-with-wandb :detective::robot: Monitoring a PyTorch Lightning CNN with Weights & Biases	20	Experimental	15	Jupyter Notebook
98	Arakiss/hecate-os Linux distro with automatic hardware detection and per-system optimization....	20	Experimental	1	Rust
99	JeffWigger/FastDynamicBatcher FastDynamicBatcher is a library for batching inputs across requests to...	19	Experimental	6	Python
100	ashishpatel26/Rapidsai_Machine_learning_on_GPU Rapidsai_Machine_learnring_on_GPU	19	Experimental	9	Jupyter Notebook
101	Dev-next-gen/Bittensor-rocm ROCm-compatible fork of Bittensor – Full PyTorch 2.4 ROCm support – Wallet,...	15	Experimental	6	Python
102	gdf-ai/gdf Open-source community GPU network for distributed AI model training	15	Experimental	1	Python
103	PatrickPontes44/tiny-panda tiny-panda is a lightweight JavaScript library inspired by Python’s pandas....	15	Experimental	3	TypeScript
104	Kritim708/multi-gpu-deep-learning-nvidia-workshop This repository contains a project I created as part of the NVIDIA workshop...	15	Experimental	—	Python
105	Prelf1992/distributed-ml-training-system A proof-of-concept for a distributed machine learning training system,...	14	Experimental	—	C++
106	Gaius-del/python_hpc_2025 🚀 Accelerate scientific applications in supercomputing with Python using...	14	Experimental	—	Jupyter Notebook
107	Pects1949/Python-Distributed-ML-Framework A Python framework for distributed machine learning training, leveraging...	14	Experimental	—	Python
108	shivangraval50/distributed-ml-training Distributed ML training platform achieving 10.6× speedup \| PyTorch DDP \|...	14	Experimental	—	Python
109	JagjeevanAK/CruxML (Under-Development) A minified Machine Learning and Deep learning Framework/Library.	14	Experimental	3	Jupyter Notebook
110	DaveAldon/Distributed-ML-with-MLX 🍎👉🍏 Everything you need in order to get started building distributed machine...	14	Experimental	17	Python
111	dlzou/computron Serving distributed deep learning models with model parallel swapping.	13	Experimental	5	Jupyter Notebook
112	explcre/pipeDejavu pipeDejavu: Hardware-aware Latency Predictable, Differentiable Search for...	12	Experimental	3	Jupyter Notebook
113	ArslanKamchybekov/raydar Raydar is the smart lost and found platform designed specifically for UIC...	11	Experimental	—	TypeScript
114	Jason-Wang313/OmniTrace A full-stack GPU profiling and simulation framework that bridges high-level...	11	Experimental	—	Rust
115	GUT-AI/memory-bottleneck Memory Bottleneck of Deep Learning models	10	Experimental	1	—
116	explcre/SHUKUN-Technology-AlgorithmIntern-MultiNodeTraining-for-DLmodels-Horovod-ConfigurationTutorial-Perf SHUKUN Technology Co.,Ltd Algorithm intern (2020/12-2021/5). Multi-GPU,...	10	Experimental	1	HTML