detail-novelist/novelist-triton-server

Deploy KoGPT with Triton Inference Server

/ 100

Experimental

This project helps machine learning engineers or MLOps specialists deploy the large KoGPT language model for efficient, real-time use. It provides a structured way to take the raw KoGPT model weights and set them up within a high-performance inference server. The output is a production-ready server capable of handling requests for generating text or other language tasks.

No commits in the last 6 months.

Use this if you need to deploy a KoGPT language model on NVIDIA GPUs and require fast, optimized inference performance in a production environment.

Not ideal if you are looking for a simple Python library for local model inference or if you don't have access to NVIDIA GPUs.

MLOps GPU-inference Large-Language-Models production-deployment real-time-AI

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Shell

License

—

Higher-rated alternatives

openvinotoolkit/nncf

Neural Network Compression Framework for enhanced OpenVINO™ inference

huggingface/optimum

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers...

NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

huggingface/optimum-intel

🤗 Optimum Intel: Accelerate inference with Intel optimization tools

eole-nlp/eole

Open language modeling toolkit based on PyTorch

Explore Transformer Models

All categories Trending Transformer directory Insights