yhinsson/airllm

🚀 Optimize memory for large language models, enabling 70B models on a 4GB GPU and 405B Llama3.1 on 8GB VRAM without compression techniques.

/ 100

Experimental

No License No Package No Dependents

Maintenance 10 / 25

Adoption 2 / 25

Maturity 1 / 25

Community 0 / 25

Stars

Forks

—

Language

—

License

—

Category

Last pushed

Feb 03, 2026

Commits (30d)

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yhinsson/airllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

lyogavin/airllm

AirLLM 70B inference with single 4GB GPU

shibing624/MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline....

GradientHQ/parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

CrazyBoyM/llama3-Chinese-chat

Llama3、Llama3.1 中文后训练版仓库 - 微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档。

MediaBrain-SJTU/MING

明医 (MING)：中文医疗问诊大模型