ivkalgin commited on
Commit
918550e
1 Parent(s): 996a215

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md CHANGED
@@ -1,3 +1,63 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - lambada
5
+ language:
6
+ - en
7
+ library_name: transformers
8
+ pipeline_tag: text-generation
9
+ tags:
10
+ - text-generation-inference
11
+ - causal-lm
12
+ - int8
13
+ - tensorrt
14
+ - ENOT-AutoDL
15
  ---
16
+
17
+ # GPT2
18
+
19
+ This repository contains GPT2 onnx models compatible with TensorRT:
20
+ * gpt2-xl.onnx - GPT2-XL onnx for fp32 or fp16 engines
21
+ * gpt2-xl-i8.onnx - GPT2-XL onnx for int8+fp32 engines
22
+
23
+ Quantization of models was performed by the [ENOT-AutoDL](https://pypi.org/project/enot-autodl/) framewor.
24
+ Code for building of TensorRT engines and examples published on [github](https://github.com/ENOT-AutoDL/ENOT-transformers).
25
+
26
+ ## Metrics:
27
+
28
+ ### GPT2-XL
29
+
30
+ | |TensorRT INT8+FP32|torch FP16|
31
+ |---|:---:|:---:|
32
+ | **Lambada Acc** |72.11%|71.43%|
33
+
34
+ ### Test environment
35
+
36
+ * GPU RTX 4090
37
+ * CPU 11th Gen Intel(R) Core(TM) i7-11700K
38
+ * TensorRT 8.5.3.1
39
+ * pytorch 1.13.1+cu116
40
+
41
+ ## Latency:
42
+
43
+ ### GPT2-XL
44
+
45
+ |Input sequance length|Number of generated tokens|TensorRT INT8+FP32 ms|torch FP16 ms|Acceleration|
46
+ |:---:|:---:|:---:|:---:|:---:|
47
+ |64|64|462|1190|2.58|
48
+ |64|128|920|2360|2.54|
49
+ |64|256|1890|4710|2.54|
50
+
51
+ ### Test environment
52
+
53
+ * GPU RTX 4090
54
+ * CPU 11th Gen Intel(R) Core(TM) i7-11700K
55
+ * TensorRT 8.5.3.1
56
+ * pytorch 1.13.1+cu116
57
+
58
+ ## How to use
59
+
60
+ Example of inference and accuracy test [published on github](https://github.com/ENOT-AutoDL/ENOT-transformers):
61
+ ```shell
62
+ git clone https://github.com/ENOT-AutoDL/ENOT-transformers
63
+ ```