bytedance/byteps

A high performance and generic framework for distributed DNN training

Archived

/ 100

Emerging

Supports TensorFlow, PyTorch, Keras, and MXNet with both TCP and RDMA networking, using a cloud-optimized architecture that replaces MPI with custom inter-machine communication alongside intra-machine NCCL. Incorporates hierarchical strategies, pipelining, tensor partitioning, and priority-based scheduling to achieve ~90% scaling efficiency on 256 GPUs—significantly outperforming Horovod+NCCL, particularly on bandwidth-constrained networks. Horovod-compatible API enables minimal code changes for switching frameworks.

3,718 stars. No commits in the last 6 months.

Archived Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 22 / 25

How are scores calculated?

Stars

3,718

Forks

495

Language

Python

License

—

Higher-rated alternatives

deepspeedai/DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference...

helmholtz-analytics/heat

Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python

horovod/horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

bsc-wdc/dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

xorbitsai/xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.

Explore ML Frameworks

All categories Trending ML Framework directory Insights