aws-solutions-library-samples/guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks
Comprehensive, scalable ML inference architecture using Amazon EKS, leveraging Graviton processors for cost-effective CPU-based inference and GPU instances for accelerated inference. Guidance provides a complete end-to-end platform for deploying LLMs with agentic AI capabilities, including RAG and MCP
Built on EKS with Karpenter for dynamic scaling, the solution orchestrates multi-agent workflows using the Strands Agent SDK, with LiteLLM providing a unified OpenAI-compatible gateway across Ray Serve and vLLM inference engines. Key integration points include Amazon OpenSearch for RAG, Langfuse for LLM observability, and MCP servers for external tools like Tavily web search, creating a production-grade agentic AI platform with automated feedback loops and quality assurance via Bedrock-hosted evaluators.
Stars
21
Forks
9
Language
Python
License
MIT-0
Category
Last pushed
Feb 14, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/mlops/aws-solutions-library-samples/guidance-for-scalable-model-inference-and-agentic-ai-on-amazon-eks"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.