EricLBuehler/mistral.rs
Fast, flexible LLM inference
Supports any Hugging Face model with zero configuration, handling multimodal inputs (text, vision, video, audio) and quantization formats (GGUF, GPTQ, AWQ, FP8) seamlessly. Built on continuous batching, FlashAttention, PagedAttention, and optional multi-GPU tensor parallelism for optimized throughput. Provides Python/Rust SDKs, an integrated web UI, hardware auto-tuning, and agentic capabilities including tool calling and MCP client support.
6,681 stars. Actively maintained with 18 commits in the last 30 days.
Stars
6,681
Forks
540
Language
Rust
License
MIT
Category
Last pushed
Feb 27, 2026
Commits (30d)
18
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/EricLBuehler/mistral.rs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
nerdai/llms-from-scratch-rs
A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.
brontoguana/krasis
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer...
ShelbyJenkins/llm_utils
llm_utils: Basic LLM tools, best practices, and minimal abstraction.