Edit model card

GPT2

This repository contains GPT2 onnx models compatible with TensorRT:

  • gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines
  • gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines

Quantization of models was performed by the ENOT-AutoDL framework. Code for building of TensorRT engines and examples published on github.

Metrics:

GPT2-XL

TensorRT INT8+FP32 torch FP16
Lambada Acc 72.11% 71.43%

Test environment

  • GPU RTX 4090
  • CPU 11th Gen Intel(R) Core(TM) i7-11700K
  • TensorRT 8.5.3.1
  • pytorch 1.13.1+cu116

Latency:

GPT2-XL

Input sequance length Number of generated tokens TensorRT INT8+FP32 ms torch FP16 ms Acceleration
64 64 462 1190 2.58
64 128 920 2360 2.54
64 256 1890 4710 2.54

Test environment

  • GPU RTX 4090
  • CPU 11th Gen Intel(R) Core(TM) i7-11700K
  • TensorRT 8.5.3.1
  • pytorch 1.13.1+cu116

How to use

Example of inference and accuracy test published on github:

git clone https://github.com/ENOT-AutoDL/ENOT-transformers
Downloads last month
0

Dataset used to train ENOT-AutoDL/gpt2-tensorrt