mosaicml/streaming
A Data Streaming Library for Efficient Neural Network Training
Streams training data from cloud storage (S3, GCS, Azure, OCI) using sharded MDS format with automatic caching and decompression, eliminating the need to pre-download full datasets. Implements distributed-training-aware sampling and shuffling via a drop-in PyTorch `IterableDataset` replacement, supporting images, text, video, and multimodal data with optional compression (zstd).
1,472 stars.
Stars
1,472
Forks
189
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 02, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/mosaicml/streaming"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
opentensor/bittensor
Internet-scale Neural Networks
trailofbits/fickling
A Python pickling decompiler and static analyzer
benchopt/benchopt
A framework for reproducible, comparable benchmarks
BiomedSciAI/fuse-med-ml
A python framework accelerating ML based discovery in the medical field by encouraging code...
taoshidev/vanta-network
Vanta Network built on Bittensor