thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

85
/ 100
Verified

Supports heterogeneous CPU+GPU inference and on-the-fly quantization conversion (FP4→FP8/BF16), enabling single-card deployment of 671B models. Provides production-grade stability across diverse hardware platforms—NVIDIA, Ascend, Moxin, and Hygon—with scalability from pure CPU setups to large distributed clusters. Optimized for models like DeepSeek, Qwen, and GLM with integrated batching and long-context efficiency techniques.

3,418 stars and 13 monthly downloads. Actively maintained with 128 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25
Adoption 13 / 25
Maturity 25 / 25
Community 22 / 25

How are scores calculated?

Stars

3,418

Forks

477

Language

Python

License

Apache-2.0

Last pushed

Mar 13, 2026

Monthly downloads

13

Commits (30d)

128

Dependencies

21

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/thu-pacman/chitu"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.