TJ-Solergibert/transformers-in-supercomputers
Transformers training in a supercomputer with the 🤗 Stack and Slurm
This project helps machine learning engineers efficiently train large language models, specifically Transformer-based architectures, on powerful supercomputing clusters. It provides practical examples and scripts for distributing model training across multiple GPUs and nodes. The input is a Transformer model and a dataset, and the output is a more quickly trained model, with insights into optimal training configurations for speed.
No commits in the last 6 months.
Use this if you are a machine learning engineer or researcher working with Transformer models and need to significantly reduce training times by utilizing multi-GPU or multi-node supercomputing environments managed by Slurm.
Not ideal if you are looking to train models on a single GPU or standard cloud instances, or if your primary concern is model accuracy rather than training efficiency and distributed performance.
Stars
15
Forks
—
Language
Python
License
—
Category
Last pushed
May 09, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/TJ-Solergibert/transformers-in-supercomputers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINOâ„¢ inference
huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
eole-nlp/eole
Open language modeling toolkit based on PyTorch