alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
Supports inference and training across multiple frameworks (TensorFlow, Caffe, ONNX, TorchScript) with specialized runtimes for LLMs via MNN-LLM and diffusion models via MNN-Diffusion. Employs aggressive optimization strategies including FP16/Int8 quantization (50-70% size reduction), minimal dependencies, and platform-specific backends to achieve sub-2MB executable overhead on iOS and 800KB core library on Android. Integrates with MNN Workbench for model visualization and one-click deployment across mobile, embedded, and IoT devices.
14,526 stars and 220,239 monthly downloads. Actively maintained with 77 commits in the last 30 days. Available on PyPI.
Stars
14,526
Forks
2,234
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Monthly downloads
220,239
Commits (30d)
77
Dependencies
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/alibaba/MNN"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Compare
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...
ARahim3/mlx-tune
Bringing the Unsloth experience to Mac users via Apple's MLX framework