cli99/llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

39
/ 100
Emerging

Provides theoretical performance estimation for LLMs across diverse parallelism schemes (tensor, pipeline, sequence, expert, and data parallelism) and optimization techniques like activation recomputation and quantization. Integrates with Hugging Face model configs for automatic parameter extraction, while supporting custom JSON-based model/GPU/dtype specifications via CLI or Python API. Enables rapid what-if analysis of training/inference setups to identify feasible configurations and optimal throughput-latency tradeoffs without running actual experiments.

479 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 10 / 25
Maturity 9 / 25
Community 18 / 25

How are scores calculated?

Stars

479

Forks

56

Language

Python

License

Apache-2.0

Last pushed

Apr 19, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/cli99/llm-analysis"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.