mit-han-lab/TinyChatEngine
TinyChatEngine: On-Device LLM Inference Library
Supports both LLMs and vision language models (VLMs) through AWQ and SmoothQuant quantization techniques, enabling 4-bit inference with minimal accuracy loss. Built as zero-dependency C/C++ for cross-platform compatibility—x86, ARM (M1/M2, Raspberry Pi), and CUDA GPUs—with optimized kernels for each architecture. Pre-quantized model zoo on Hugging Face ensures immediate deployment without requiring compression infrastructure.
944 stars. No commits in the last 6 months.
Stars
944
Forks
95
Language
C++
License
MIT
Category
Last pushed
Jul 04, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mit-han-lab/TinyChatEngine"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...
waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models...
jordanhubbard/nanolang
A tiny experimental language designed to be targeted by coding LLMs
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides an interface...
akivasolutions/tightwad
Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you...