Rohit2sali/vllm-multi-tenant-llm-gateway
This is vllm multi tenant large language model gateway. This system is created to serve lot of requests at same time to lot of users. It uses vllm as it's engine to run llm, it has scheduler to schedule the queries of users and limiter to limit the use of specific user. It also uses LoRA adapters in vllm.
Stars
—
Forks
—
Language
Jupyter Notebook
License
—
Category
Last pushed
Mar 05, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Rohit2sali/vllm-multi-tenant-llm-gateway"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
robert-mcdermott/ollama-batch-cluster
Large Scale Batch Processing with Ollama
anmolg1997/Multi-LoRA-Serve
Multi-adapter inference gateway — one base model, many LoRA adapters per-request,...
kimmmmyy223/llm-batch
🚀 Process JSON data in batches with `llm-batch`, leveraging sequential or parallel modes for...