akivasolutions/tightwad
Pool your CUDA + ROCm GPUs into one OpenAI-compatible API. Speculative decoding proxy gives you 2-3x faster inference — for free, using hardware you already own. Stop renting GPU clouds. Be a tightwad.
Available on PyPI.
Stars
4
Forks
1
Language
Python
License
MIT
Category
Last pushed
Mar 28, 2026
Monthly downloads
330
Commits (30d)
0
Dependencies
6
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/akivasolutions/tightwad"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
jundot/omlx
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the...
jordanhubbard/nanolang
A tiny experimental language designed to be targeted by coding LLMs
waybarrios/vllm-mlx
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models...
josStorer/RWKV-Runner
A RWKV management and startup tool, full automation, only 8MB. And provides an interface...
petrukha-ivan/mlx-swift-structured
Structured output generation in Swift