ENOT-AutoDL/gpt-j-6B-tensorrt-int8

GPT-J 6B inference on TensorRT with INT-8 precision

/ 100

Experimental

This project helps developers integrate GPT-J 6B, a large language model, into applications requiring high-speed text generation on specific NVIDIA GPUs. It provides pre-optimized model engines that take text prompts and quickly produce generated text outputs. Developers building real-time AI applications or services that leverage GPT-J 6B would find this useful for deployment.

No commits in the last 6 months.

Use this if you are a developer looking to deploy GPT-J 6B for fast, efficient text generation on an NVIDIA RTX 2080 Ti, 3080 Ti, or 4090 GPU.

Not ideal if you need to run GPT-J 6B on different hardware or require an ONNX model for custom compilation, as those options are not yet available.

AI-development language-model-deployment GPU-optimization real-time-AI inference-acceleration

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

tabularis-ai/be_great

A novel approach for synthesizing tabular data using pretrained large language models

EleutherAI/gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron...

shibing624/textgen

TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet...

ai-forever/ru-gpts

Russian GPT3 models.

AdityaNG/kan-gpt

The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold...

Explore Transformer Models

All categories Trending Transformer directory Insights