izmttk/ullm
Lightweight LLM inference engine inspired by nano-vllm, with radix-tree based prefix cache, tp & pp, cuda graph, openai api, async scheduling, and more.
Stars
9
Forks
—
Language
Python
License
MIT
Category
Last pushed
Mar 10, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/izmttk/ullm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Goekdeniz-Guelmez/mlx-lm-lora
Train Large Language Models on MLX.
uber-research/PPLM
Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
jarobyte91/pytorch_beam_search
A lightweight implementation of Beam Search for sequence models in PyTorch.
SmallDoges/small-doge
Doge Family of Small Language Models
VHellendoorn/Code-LMs
Guide to using pre-trained large language models of source code