nshkrdotcom/vllm
vLLM - High-throughput, memory-efficient LLM inference engine with PagedAttention, continuous batching, CUDA/HIP optimization, quantization (GPTQ/AWQ/INT4/INT8/FP8), tensor/pipeline parallelism, OpenAI-compatible API, multi-GPU/TPU/Neuron support, prefix caching, and multi-LoRA capabilities
Stars
—
Forks
—
Language
Elixir
License
MIT
Category
Last pushed
Feb 07, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/nshkrdotcom/vllm"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
limix-ldm-ai/LimiX
LimiX: Unleashing Structured-Data Modeling Capability for Generalist Intelligence...
XXO47OXX/layer-scan
Automated LLM layer duplication config scanner — find the optimal (i,j) for any model + task
tatsu-lab/stanford_alpaca
Code and documentation to train Stanford's Alpaca models, and generate the data.
google-research/plur
PLUR (Programming-Language Understanding and Repair) is a collection of source code datasets...
thuml/LogME
Code release for "LogME: Practical Assessment of Pre-trained Models for Transfer Learning" (ICML...