agoncharenko1992's picture
Update README.md
4231dbc
|
raw
history blame
1 kB
metadata
license: apache-2.0
datasets:
  - lambada
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - text-generation-inference
  - causal-lm
  - int8
  - tensorrt
  - ENOT-AutoDL

INT8 GPT-J 6B

GPT-J 6B is a transformer model trained using Ben Wang's Mesh Transformer JAX. "GPT-J" refers to the class of model, while "6B" represents the number of trainable parameters.

This repository contains TensorRT engines with mixed precission int8 + fp32. You can find prebuilt engines for the following GPUs:

  • RTX 4090
  • RTX 3080 Ti
  • RTX 2080 Ti

ONNX model generated by ENOT-AutoDL and will be published soon.

Test result

INT8 FP32
Lambada Acc 78.50% 79.54%
Model size (GB) 8.5 24.2

How to use

Example of inference and accuracy test published on github:

git clone https://github.com/ENOT-AutoDL/demo-gpt-j-6B-tensorrt-int8