zhuhanqing/APOLLO

APOLLO: SGD-like Memory, AdamW-level Performance; MLSys'25 Oustanding Paper Honorable Mention

/ 100

Emerging

Combines low-rank gradient approximation with channel-wise or tensor-wise scaling factors computed via pure random projections in an auxiliary space, eliminating expensive SVD computations. Integrated into Hugging Face Transformers, LLaMA-Factory, and axolotl, with support for distributed training via FSDP and quantization-aware variants (QAPOLLO with int8 weights).

271 stars.

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 10 / 25

How are scores calculated?

Stars

271

Forks

Language

Python

License

—

Higher-rated alternatives

zhenye234/xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Y-Research-SBU/CSRv2

Official Repository for CSRv2 - ICLR 2026

HITESHLPATEL/Mamba-Papers

Awesome Mamba Papers: A Curated Collection of Research Papers , Tutorials & Blogs

psychofict/llm-effective-context-length

Investigating Why the Effective Context Length of LLMs Falls Short (Based on STRING, ICLR 2025)

MouxiaoHuang/PPE

[ICLR 2026] Official code of PPE: Positional Preservation Embedding for Token Compression in...

Explore LLM Tools

All categories Trending LLM Tool directory Insights