detail-novelist/novelist-triton-server
Deploy KoGPT with Triton Inference Server
This project helps machine learning engineers or MLOps specialists deploy the large KoGPT language model for efficient, real-time use. It provides a structured way to take the raw KoGPT model weights and set them up within a high-performance inference server. The output is a production-ready server capable of handling requests for generating text or other language tasks.
No commits in the last 6 months.
Use this if you need to deploy a KoGPT language model on NVIDIA GPUs and require fast, optimized inference performance in a production environment.
Not ideal if you are looking for a simple Python library for local model inference or if you don't have access to NVIDIA GPUs.
Stars
14
Forks
—
Language
Shell
License
—
Category
Last pushed
Nov 18, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/detail-novelist/novelist-triton-server"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
openvinotoolkit/nncf
Neural Network Compression Framework for enhanced OpenVINO™ inference
huggingface/optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...
NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
eole-nlp/eole
Open language modeling toolkit based on PyTorch