EricLBuehler/mistral.rs

Fast, flexible LLM inference

/ 100

Established

Supports any Hugging Face model with zero configuration, handling multimodal inputs (text, vision, video, audio) and quantization formats (GGUF, GPTQ, AWQ, FP8) seamlessly. Built on continuous batching, FlashAttention, PagedAttention, and optional multi-GPU tensor parallelism for optimized throughput. Provides Python/Rust SDKs, an integrated web UI, hardware auto-tuning, and agentic capabilities including tool calling and MCP client support.

6,681 stars. Actively maintained with 18 commits in the last 30 days.

No Package No Dependents

Maintenance 17 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

6,681

Forks

540

Language

Rust

License

MIT

Category

rust-llm-infrastructure

Last pushed

Feb 27, 2026

Commits (30d)

GitHub

Rust Llm Infrastructure · 4 models

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/EricLBuehler/mistral.rs"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Related models

nerdai/llms-from-scratch-rs

A comprehensive Rust translation of the code from Sebastian Raschka's Build an LLM from Scratch book.

brontoguana/krasis

Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer...

ShelbyJenkins/llm_utils

llm_utils: Basic LLM tools, best practices, and minimal abstraction.

Explore Transformer Models

All categories Trending Transformer directory Insights