openvinotoolkit/model_server
A scalable inference server for models optimized with OpenVINO™
C++ implementation optimized for Intel hardware, exposing models via gRPC or REST with OpenAI-compatible APIs for text generation, embeddings, image generation, and speech processing. Supports model composition through directed acyclic graph (DAG) pipelines with custom nodes, dynamic batching, and multi-framework model loading (TensorFlow, ONNX, PaddlePaddle). Integrates with KServe and TensorFlow Serving protocols while enabling model storage from local, object storage, or HuggingFace sources, with deployment flexibility across Docker, bare metal, Kubernetes, and Windows.
836 stars. Actively maintained with 35 commits in the last 30 days.
Stars
836
Forks
241
Language
C++
License
Apache-2.0
Category
Last pushed
Mar 13, 2026
Commits (30d)
35
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/openvinotoolkit/model_server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Featured in
Related tools
madroidmaq/mlx-omni-server
MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically...
NVIDIA-NeMo/Guardrails
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based...
rhesis-ai/rhesis
Open-source platform & SDK for testing LLM and agentic apps. Define expected behavior, generate...
taco-group/OpenEMMA
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
generative-computing/mellea
Mellea is a library for writing generative programs.