Jason-cs18/HetServe-Foundation
A Overview of Efficiently Serving Foundation Models across Edge Devices
This overview helps engineers and system architects understand how to deploy powerful AI models like large language models and diffusion models on many different devices, even less powerful ones. It outlines various strategies and techniques for getting AI model outputs quickly and efficiently to users, making sure models run smoothly across various hardware. This is for anyone building or managing systems that serve AI models to end-users.
No commits in the last 6 months.
Use this if you need to deploy large AI models efficiently across many distributed, diverse devices while maintaining low latency and high scalability.
Not ideal if you are looking for an implementation-ready library or a detailed guide for a single, powerful server deployment.
Stars
14
Forks
—
Language
—
License
—
Category
Last pushed
Jan 17, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/Jason-cs18/HetServe-Foundation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PrunaAI/pruna
Pruna is a model optimization framework built for developers, enabling you to deliver faster,...
bytedance/LatentSync
Taming Stable Diffusion for Lip Sync!
haoheliu/AudioLDM-training-finetuning
AudioLDM training, finetuning, evaluation and inference.
Text-to-Audio/Make-An-Audio
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead...