ENOT-AutoDL/gpt-j-6B-tensorrt-int8

GPT-J 6B inference on TensorRT with INT-8 precision

13
/ 100
Experimental

This project helps developers integrate GPT-J 6B, a large language model, into applications requiring high-speed text generation on specific NVIDIA GPUs. It provides pre-optimized model engines that take text prompts and quickly produce generated text outputs. Developers building real-time AI applications or services that leverage GPT-J 6B would find this useful for deployment.

No commits in the last 6 months.

Use this if you are a developer looking to deploy GPT-J 6B for fast, efficient text generation on an NVIDIA RTX 2080 Ti, 3080 Ti, or 4090 GPU.

Not ideal if you need to run GPT-J 6B on different hardware or require an ONNX model for custom compilation, as those options are not yet available.

AI-development language-model-deployment GPU-optimization real-time-AI inference-acceleration
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

11

Forks

Language

Python

License

Last pushed

Apr 05, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ENOT-AutoDL/gpt-j-6B-tensorrt-int8"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.