mlc-ai/mlc-llm

Universal LLM Deployment Engine with ML Compilation

/ 100

Established

Compiles LLMs to optimized machine code via TVM's ML compilation framework, then executes them through MLCEngine—a unified inference runtime supporting diverse backends (CUDA, ROCm, Metal, WebGPU, OpenCL) across GPUs, mobile devices, and browsers. Exposes OpenAI-compatible REST and language-specific APIs (Python, JavaScript, iOS, Android) from the same compiled engine, enabling model-agnostic deployment without framework lock-in.

22,185 stars. Actively maintained with 15 commits in the last 30 days.

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

22,185

Forks

1,960

Language

Python

License

Apache-2.0

Compare

mlc-llm and llm-deploy

Related models

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

skyzh/tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny...

ServerlessLLM/ServerlessLLM

Serverless LLM Serving for Everyone.

AXERA-TECH/ax-llm

Explore LLM model deployment based on AXera's AI chips

VectorInstitute/vector-inference

Efficient LLM inference on Slurm clusters.

Explore Transformer Models

All categories Trending Transformer directory Insights