ENOT-AutoDL/gpt-j-6B-tensorrt-int8
GPT-J 6B inference on TensorRT with INT-8 precision
This project helps developers integrate GPT-J 6B, a large language model, into applications requiring high-speed text generation on specific NVIDIA GPUs. It provides pre-optimized model engines that take text prompts and quickly produce generated text outputs. Developers building real-time AI applications or services that leverage GPT-J 6B would find this useful for deployment.
No commits in the last 6 months.
Use this if you are a developer looking to deploy GPT-J 6B for fast, efficient text generation on an NVIDIA RTX 2080 Ti, 3080 Ti, or 4090 GPU.
Not ideal if you need to run GPT-J 6B on different hardware or require an ONNX model for custom compilation, as those options are not yet available.
Stars
11
Forks
—
Language
Python
License
—
Category
Last pushed
Apr 05, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ENOT-AutoDL/gpt-j-6B-tensorrt-int8"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
tabularis-ai/be_great
A novel approach for synthesizing tabular data using pretrained large language models
EleutherAI/gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron...
shibing624/textgen
TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...
ai-forever/ru-gpts
Russian GPT3 models.
AdityaNG/kan-gpt
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...