--- license: apache-2.0 datasets: - lambada language: - en library_name: transformers pipeline_tag: text-generation tags: - text-generation-inference - causal-lm - int8 - tensorrt - ENOT-AutoDL --- # GPT2 This repository contains GPT2 onnx models compatible with TensorRT: * gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines * gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines Quantization of models was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framework. Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers). ## Metrics: ### GPT2-XL | |TensorRT INT8+FP32|torch FP16| |---|:---:|:---:| | **Lambada Acc** |72.11%|71.43%| ### Test environment * GPU RTX 4090 * CPU 11th Gen Intel(R) Core(TM) i7-11700K * TensorRT 8.5.3.1 * pytorch 1.13.1+cu116 ## Latency: ### GPT2-XL |Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration| |:---:|:---:|:---:|:---:|:---:| |64|64|462|1190|2.58| |64|128|920|2360|2.54| |64|256|1890|4710|2.54| ### Test environment * GPU RTX 4090 * CPU 11th Gen Intel(R) Core(TM) i7-11700K * TensorRT 8.5.3.1 * pytorch 1.13.1+cu116 ## How to use Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers): ```shell git clone https://github.com/ENOT-AutoDL/ENOT-transformers ```