cli99/llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
Provides theoretical performance estimation for LLMs across diverse parallelism schemes (tensor, pipeline, sequence, expert, and data parallelism) and optimization techniques like activation recomputation and quantization. Integrates with Hugging Face model configs for automatic parameter extraction, while supporting custom JSON-based model/GPU/dtype specifications via CLI or Python API. Enables rapid what-if analysis of training/inference setups to identify feasible configurations and optimal throughput-latency tradeoffs without running actual experiments.
479 stars. No commits in the last 6 months.
Stars
479
Forks
56
Language
Python
License
Apache-2.0
Category
Last pushed
Apr 19, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/cli99/llm-analysis"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
TsinghuaC3I/MARTI
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
tanyuqian/redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to...
zjunlp/KnowLM
An Open-sourced Knowledgable Large Language Model Framework.
ariannamethod/chuck.optimizer
Adam is blind. Chuck sees. Lee 4ever.
ykjaat6104/LLM-Cost-and-Token-Efficiency-Analysis
A benchmark study analyzing cost and token efficiency across 14 LLMs from 5 providers —...