thu-pacman/chitu
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
Supports heterogeneous CPU+GPU inference and on-the-fly quantization conversion (FP4→FP8/BF16), enabling single-card deployment of 671B models. Provides production-grade stability across diverse hardware platforms—NVIDIA, Ascend, Moxin, and Hygon—with scalability from pure CPU setups to large distributed clusters. Optimized for models like DeepSeek, Qwen, and GLM with integrated batching and long-context efficiency techniques.
3,418 stars and 13 monthly downloads. Actively maintained with 128 commits in the last 30 days. Available on PyPI.
Stars
3,418
Forks
477
Language
Python
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Monthly downloads
13
Commits (30d)
128
Dependencies
21
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/thu-pacman/chitu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
NotPunchnox/rkllama
Ollama alternative for Rockchip NPU: An efficient solution for running AI and Deep learning...
sophgo/LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
Deep-Spark/DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of...
HuaizhengZhang/AI-Infra-from-Zero-to-Hero
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for...
eth-sri/lmql
A language for constraint-guided and efficient LLM programming.