containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
Automatically detects host GPU capabilities and pulls optimized container images (CUDA, ROCm, Intel GPU, etc.), eliminating manual driver configuration while supporting multiple AI model registries including OCI Container Registries. Models are managed through familiar container commands and expose inference via REST API or interactive chatbot interfaces, running in isolated rootless containers with network access disabled by default.
2,640 stars. Used by 1 other package. Actively maintained with 160 commits in the last 30 days. Available on PyPI.
Stars
2,640
Forks
305
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
160
Dependencies
4
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/containers/ramalama"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
av/harbor
One command brings a complete pre-wired LLM stack with hundreds of services to explore.
RunanywhereAI/runanywhere-sdks
Production ready toolkit to run AI locally
runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
vtuber-plan/olah
Self-hosted huggingface mirror service. 自建huggingface镜像服务。
foldl/chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)