huggingface/text-generation-inference

Large Language Model Text Generation Inference

82
/ 100
Verified

Built in Rust with Python bindings and gRPC support, TGI implements continuous batching, tensor parallelism across GPUs, and optimized kernels using Flash Attention and Paged Attention for popular model architectures. It provides OpenAI-compatible Chat Completion API endpoints alongside streaming generation via Server-Sent Events, with comprehensive quantization support (bitsandbytes, GPTQ, AWQ, Marlin, fp8) and guidance features for constrained output formats. Now in maintenance mode, it pioneered the shift toward transformers-based model architectures that downstream engines like vLLM and SGLang have adopted.

10,802 stars and 143,543 monthly downloads. Used by 3 other packages. Actively maintained with 1 commit in the last 30 days. Available on PyPI.

Maintenance 13 / 25
Adoption 23 / 25
Maturity 25 / 25
Community 21 / 25

How are scores calculated?

Stars

10,802

Forks

1,261

Language

Python

License

Apache-2.0

Last pushed

Jan 08, 2026

Monthly downloads

143,543

Commits (30d)

1

Dependencies

3

Reverse dependents

3

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/huggingface/text-generation-inference"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.