xkiwilabs/llm-inference-hub
A reproducible LLM inference stack built on vLLM + LiteLLM, designed for multi-GPU Ubuntu workstations. Serves multiple models simultaneously over a single OpenAI-compatible API.
Stars
—
Forks
—
Language
Shell
License
—
Category
Last pushed
Mar 28, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/xkiwilabs/llm-inference-hub"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
hassancs91/SimplerLLM
Simplify interactions with Large Language Models
tylerelyt/LLM-Workshop
🌟 Learn Large Language Model development through hands-on projects and real-world implementations
avilum/minrlm
Token-efficient Recursive Language Model. 3.6x fewer tokens than vanilla LLMs. Data never enters...
kyegomez/SingLoRA
This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix...
NetEase-Media/grps_trtllm
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM...