HPC Cluster Management ML Frameworks
Resources, guides, and tools for setting up, configuring, and managing HPC clusters and distributed computing infrastructure for ML workloads. Does NOT include general cloud computing platforms, containerization tools, or ML frameworks themselves.
There are 32 hpc cluster management frameworks tracked. 4 score above 50 (established tier). The highest-rated is qualcomm/ai-hub-models at 68/100 with 940 stars. 2 of the top 10 are actively maintained.
Get all 32 projects as JSON
curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=hpc-cluster-management&limit=20"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
| # | Framework | Score | Tier |
|---|---|---|---|
| 1 |
qualcomm/ai-hub-models
Qualcomm® AI Hub Models is our collection of state-of-the-art machine... |
|
Established |
| 2 |
lincc-frameworks/hyrax
Hyrax - A low-code framework for rapid experimentation with ML &... |
|
Established |
| 3 |
petuum/adaptdl
Resource-adaptive cluster scheduler for deep learning training. |
|
Established |
| 4 |
zszazi/Deep-learning-in-cloud
List of Deep Learning Cloud Providers |
|
Established |
| 5 |
openhackathons-org/gpubootcamp
This repository consists for gpu bootcamp material for HPC and AI |
|
Emerging |
| 6 |
intel/ai-reference-models
Intel® AI Reference Models: contains Intel optimizations for running deep... |
|
Emerging |
| 7 |
HydroRoll-Team/HydroRoll
跨平台、多任务、高度自定义的骰系开发框架。 |
|
Emerging |
| 8 |
HPCNow/hpcnow-labs
HPCNow! training material and hands-on sessions |
|
Emerging |
| 9 |
pescap/EasyHPC
A practical introduction to High Performance Computing (HPC) |
|
Emerging |
| 10 |
opencomputeproject/ocp-diag-windtunnel
Building & testing private AI on HPC. |
|
Experimental |
| 11 |
ray-project/ray-acm-workshop-2023
Scalable/Distributed Computer Vision with Ray |
|
Experimental |
| 12 |
debnsuma/ray-for-developers
A comprehensive hands-on guide to building production-grade distributed... |
|
Experimental |
| 13 |
binga/cloud-gpus
This repository contains information about Cloud GPU offerings for Machine... |
|
Experimental |
| 14 |
hkust-hpc-team/hkust-hpc
Handbook for AI / HPC users on HKUST central clusters |
|
Experimental |
| 15 |
Roulbac/uv-func
A Python decorator to run functions in isolated virtual environments... |
|
Experimental |
| 16 |
knagrecha/hydra
Execution framework for multi-task model parallelism. Enables the training... |
|
Experimental |
| 17 |
onlyrobot/bray
Bray is based on Ray and outperforms Ray in practical distributed... |
|
Experimental |
| 18 |
gpu-cli/zerostart
Fast cold starts for GPU Python. Streaming wheel extraction for when large... |
|
Experimental |
| 19 |
Skyld-Labs/ModelHunter
ModelHunter is a powerful pipeline designed to extract machine learning... |
|
Experimental |
| 20 |
uw-mad-dash/shockwave
Artifact for "Shockwave: Fair and Efficient Cluster Scheduling for Dynamic... |
|
Experimental |
| 21 |
hydra-hoard/hydra
A decentralised application that creates high quality machine learning datasets |
|
Experimental |
| 22 |
jonathandinu/spark-ray-data-science
Supporting content (slides and exercises) for the Pearson video series... |
|
Experimental |
| 23 |
parisimaa/NYU-HPC
NYU HPC user instruction |
|
Experimental |
| 24 |
breadboardfoundry/GPU-Infrastructure
GPU compute infrastructure for research teams running machine learning experiments. |
|
Experimental |
| 25 |
Adhytm/multi-gpu-debug-notes
Debugging and isolating GPU context preemption issus in heterogeneous... |
|
Experimental |
| 26 |
RichardScottOZ/experimenta-ml-kiro
experimenta-ml for kiro-cli |
|
Experimental |
| 27 |
Syntex-errorCode/stable-flakes
🔗 Stabilize your Flakes easily with one input for reliable NixOS modules and... |
|
Experimental |
| 28 |
erectbranch/enroot-on-slurm
Examples of using Enroot with Slurm for distributed deep learning |
|
Experimental |
| 29 |
SupreethRao99/slurmy
template scripts and notes for using SLURM on Nvidia DGX GPU cluster |
|
Experimental |
| 30 |
Akshay3510/Hydra
🔍 Develop advanced knowledge compilers and #SAT solvers with Hydra, a robust... |
|
Experimental |
| 31 |
alifzl/NeSI-Project-Template
NeSI HPC DL project Scaffolding Template |
|
Experimental |
| 32 |
smirko-dev/machine-learning-rpi
Setup ML for Raspberry Pi |
|
Experimental |