zhuhanqing/APOLLO
APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention
Combines low-rank gradient approximation with channel-wise or tensor-wise scaling factors computed via pure random projections in an auxiliary space, eliminating expensive SVD computations. Integrated into Hugging Face Transformers, LLaMA-Factory, and axolotl, with support for distributed training via FSDP and quantization-aware variants (QAPOLLO with int8 weights).
271 stars.
Stars
271
Forks
13
Language
Python
License
—
Category
Last pushed
Nov 29, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/zhuhanqing/APOLLO"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zhenye234/xcodec
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Y-Research-SBU/CSRv2
Official Repository for CSRv2 - ICLR 2026
HITESHLPATEL/Mamba-Papers
Awesome Mamba Papers: A Curated Collection of Research Papers , Tutorials & Blogs
psychofict/llm-effective-context-length
Investigating Why the Effective Context Length of LLMs Falls Short (Based on STRING, ICLR 2025)
MouxiaoHuang/PPE
[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in...