mit-han-lab/TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

/ 100

Emerging

Supports both LLMs and vision language models (VLMs) through AWQ and SmoothQuant quantization techniques, enabling 4-bit inference with minimal accuracy loss. Built as zero-dependency C/C++ for cross-platform compatibility—x86, ARM (M1/M2, Raspberry Pi), and CUDA GPUs—with optimized kernels for each architecture. Pre-quantized model zoo on Hugging Face ensures immediate deployment without requiring compression infrastructure.

944 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

944

Forks

Language

C++

License

MIT

Higher-rated alternatives

jundot/omlx

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models...

jordanhubbard/nanolang

A tiny experimental language designed to be targeted by coding LLMs

josStorer/RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides an interface...

akivasolutions/tightwad

Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...

Explore LLM Tools

All categories Trending LLM Tool directory Insights