containers/ramalama
RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.
RamaLama helps developers easily use and serve AI models for various tasks on their local machine, treating them like familiar containers. It takes an AI model from any source and provides a secure, locally-served version accessible via a REST API or as a chatbot. This tool is for developers and engineers who want to integrate AI model inference into their applications without complex system setup.
2,640 stars. Used by 1 other package. Actively maintained with 154 commits in the last 30 days. Available on PyPI.
Use this if you are a developer looking for a straightforward way to run and manage AI models locally for development or production inference, leveraging container-based workflows.
Not ideal if you are an end-user without programming knowledge or if you need a fully managed, cloud-based AI model serving solution.
Stars
2,640
Forks
305
Language
Python
License
MIT
Category
Last pushed
Mar 12, 2026
Commits (30d)
154
Dependencies
4
Reverse dependents
1
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/containers/ramalama"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
runpod-workers/worker-vllm
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
eastriverlee/LLM.swift
LLM.swift is a simple and readable library that allows you to interact with large language...
beehive-lab/GPULlama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
gitkaz/mlx_gguf_server
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...
Scottcjn/llama-cpp-power8
AltiVec/VSX optimized llama.cpp for IBM POWER8