openvinotoolkit/model_server

A scalable inference server for models optimized with OpenVINO™

/ 100

Established

C++ implementation optimized for Intel hardware, exposing models via gRPC or REST with OpenAI-compatible APIs for text generation, embeddings, image generation, and speech processing. Supports model composition through directed acyclic graph (DAG) pipelines with custom nodes, dynamic batching, and multi-framework model loading (TensorFlow, ONNX, PaddlePaddle). Integrates with KServe and TensorFlow Serving protocols while enabling model storage from local, object storage, or HuggingFace sources, with deployment flexibility across Docker, bare metal, Kubernetes, and Windows.

836 stars. Actively maintained with 35 commits in the last 30 days.

No Package No Dependents

Maintenance 23 / 25

Adoption 10 / 25

Maturity 9 / 25

Community 25 / 25

How are scores calculated?

Stars

836

Forks

241

Language

C++

License

Apache-2.0

Featured in

You're Shipping AI You Can't Measure

Related tools

madroidmaq/mlx-omni-server

MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically...

NVIDIA-NeMo/Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based...

rhesis-ai/rhesis

Open-source platform & SDK for testing LLM and agentic apps. Define expected behavior, generate...

taco-group/OpenEMMA

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

generative-computing/mellea

Mellea is a library for writing generative programs.

Explore Generative AI Tools

All categories Trending Generative AI directory Insights