containers/ramalama

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

/ 100

Verified

Automatically detects host GPU capabilities and pulls optimized container images (CUDA, ROCm, Intel GPU, etc.), eliminating manual driver configuration while supporting multiple AI model registries including OCI Container Registries. Models are managed through familiar container commands and expose inference via REST API or interactive chatbot interfaces, running in isolated rootless containers with network access disabled by default.

2,640 stars. Used by 1 other package. Actively maintained with 160 commits in the last 30 days. Available on PyPI.

Maintenance 25 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 21 / 25

How are scores calculated?

Stars

2,640

Forks

305

Language

Python

License

MIT

Related tools

av/harbor

One command brings a complete pre-wired LLM stack with hundreds of services to explore.

RunanywhereAI/runanywhere-sdks

Production ready toolkit to run AI locally

runpod-workers/worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

vtuber-plan/olah

Self-hosted huggingface mirror service. 自建huggingface镜像服务。

foldl/chatllm.cpp

Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)

Explore LLM Tools

All categories Trending LLM Tool directory Insights