cli99/llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

/ 100

Emerging

Provides theoretical performance estimation for LLMs across diverse parallelism schemes (tensor, pipeline, sequence, expert, and data parallelism) and optimization techniques like activation recomputation and quantization. Integrates with Hugging Face model configs for automatic parameter extraction, while supporting custom JSON-based model/GPU/dtype specifications via CLI or Python API. Enables rapid what-if analysis of training/inference setups to identify feasible configurations and optimal throughput-latency tradeoffs without running actual experiments.

479 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 18 / 25

How are scores calculated?

Stars

479

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

TsinghuaC3I/MARTI

A Framework for LLM-based Multi-Agent Reinforced Training and Inference

tanyuqian/redco

NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to...

zjunlp/KnowLM

An Open-sourced Knowledgable Large Language Model Framework.

ariannamethod/chuck.optimizer

Adam is blind. Chuck sees. Lee 4ever.

ykjaat6104/LLM-Cost-and-Token-Efficiency-Analysis

A benchmark study analyzing cost and token efficiency across 14 LLMs from 5 providers —...

Explore Transformer Models

All categories Trending Transformer directory Insights