Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ This model is the ONNX version of [https://huggingface.co/SamLowe/roberta-base-g
|
|
24 |
|
25 |
- that has identical accuracy/metrics to the original Transformers model
|
26 |
- and has the same model size (499MB)
|
27 |
-
- is faster
|
28 |
- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime
|
29 |
|
30 |
### Quaantized (INT8) ONNX version
|
@@ -33,7 +33,7 @@ This model is the ONNX version of [https://huggingface.co/SamLowe/roberta-base-g
|
|
33 |
|
34 |
- that is one quarter the size (125MB) of the full precision model (above)
|
35 |
- but delivers almost all of the accuracy
|
36 |
-
- is faster
|
37 |
- about 2x as fast for a batch size of 1 on an 8 core 11th gen i7 CPU using ONNXRuntime vs the full precision model above
|
38 |
- which makes it circa 5x as fast as the full precision normal Transformers model (on the above mentioned CPU, for a batch of 1)
|
39 |
|
|
|
24 |
|
25 |
- that has identical accuracy/metrics to the original Transformers model
|
26 |
- and has the same model size (499MB)
|
27 |
+
- is faster in inference than normal Transformers, particularly for smaller batch sizes
|
28 |
- in my tests about 2x to 3x as fast for a batch size of 1 on a 8 core 11th gen i7 CPU using ONNXRuntime
|
29 |
|
30 |
### Quaantized (INT8) ONNX version
|
|
|
33 |
|
34 |
- that is one quarter the size (125MB) of the full precision model (above)
|
35 |
- but delivers almost all of the accuracy
|
36 |
+
- is faster in inference than both the full precision ONNX above, and the normal Transformers model
|
37 |
- about 2x as fast for a batch size of 1 on an 8 core 11th gen i7 CPU using ONNXRuntime vs the full precision model above
|
38 |
- which makes it circa 5x as fast as the full precision normal Transformers model (on the above mentioned CPU, for a batch of 1)
|
39 |
|