File size: 10,216 Bytes
f579c3f
c12d3ba
 
f579c3f
c12d3ba
f579c3f
 
 
 
 
c12d3ba
f579c3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
tags:
- vit
datasets:
- AI-Lab-Makerere/beans
metrics:
- accuracy
- total_time_in_seconds
- samples_per_second
- latency_in_seconds
pipeline_tag: image-classification
---

**task**: `image-classification`  
**Backend:** `sagemaker-training`  
**Backend args:** `{'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}`  
**Number of evaluation samples:** `All dataset`  

Fixed parameters:
* **dataset**: [{'path': 'beans', 'eval_split': 'validation', 'data_keys': {'primary': 'image'}, 'ref_keys': ['labels'], 'name': None, 'calibration_split': 'train'}]
* **name_or_path**: `nateraw/vit-base-beans`
* **from_transformers**: `True`
* **node_exclusion**: `[]`
* **calibration**:
    * **method**: `percentile`
    * **num_calibration_samples**: `128`
    * **calibration_histogram_percentile**: `99.999`

Benchmarked parameters:
* **framework**: `onnxruntime`,  `pytorch`
* **quantization_approach**: `dynamic`,  `static`
* **operators_to_quantize**: `['Add', 'MatMul']`,  `['Add']`
* **per_channel**: `False`,  `True`
* **framework_args**: `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}`,  `{}`
* **reduce_range**: `True`,  `False`
* **apply_quantization**: `True`,  `False`

# Evaluation
## Non-time metrics
|   framework   | quantization_approach | operators_to_quantize | per_channel |                           framework_args                            | reduce_range | apply_quantization |     | accuracy |
| :-----------: | :-------------------: | :-------------------: | :---------: | :-----------------------------------------------------------------: | :----------: | :----------------: | :-: | :------: |
| `onnxruntime` |        `None`         |        `None`         |   `None`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `None`    |      `False`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.977   |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.421   |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.421   |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.316   |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.451   |
| `onnxruntime` |       `static`        |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.361   |
| `onnxruntime` |       `static`        |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.361   |
| `onnxruntime` |       `static`        |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |  0.361   |
| `onnxruntime` |       `static`        |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |  0.361   |
|   `pytorch`   |        `None`         |        `None`         |   `None`    |                                `{}`                                 |    `None`    |       `None`       |  \|  |  0.977   |

## Time metrics
Time benchmarks were run for 15 seconds per config.


Below, time metrics for batch size = 1, input length = 224.

|   framework   | quantization_approach | operators_to_quantize | per_channel |                           framework_args                            | reduce_range | apply_quantization |     | latency_mean (ms) |     | throughput (/s) |
| :-----------: | :-------------------: | :-------------------: | :---------: | :-----------------------------------------------------------------: | :----------: | :----------------: | :-: | :---------------: | :-: | :-------------: |
| `onnxruntime` |        `None`         |        `None`         |   `None`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `None`    |      `False`       |  \|  |      130.41       |  \|  |      7.73       |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      102.44       |  \|  |      9.80       |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      101.57       |  \|  |      9.87       |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      102.37       |  \|  |      9.80       |
| `onnxruntime` |       `dynamic`       |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      102.36       |  \|  |      9.80       |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      130.67       |  \|  |      7.67       |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      131.29       |  \|  |      7.67       |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      132.65       |  \|  |      7.60       |
| `onnxruntime` |       `dynamic`       |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      131.03       |  \|  |      7.67       |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      127.99       |  \|  |      7.87       |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      128.27       |  \|  |      7.80       |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      131.10       |  \|  |      7.67       |
| `onnxruntime` |       `static`        |  `['Add', 'MatMul']`  |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      130.29       |  \|  |      7.73       |
| `onnxruntime` |       `static`        |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      164.55       |  \|  |      6.13       |
| `onnxruntime` |       `static`        |       `['Add']`       |   `False`   | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      168.61       |  \|  |      5.93       |
| `onnxruntime` |       `static`        |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |   `False`    |       `True`       |  \|  |      164.52       |  \|  |      6.13       |
| `onnxruntime` |       `static`        |       `['Add']`       |   `True`    | `{'opset': 13, 'optimization_level': 1, 'intra_op_num_threads': 4}` |    `True`    |       `True`       |  \|  |      165.31       |  \|  |      6.07       |
|   `pytorch`   |        `None`         |        `None`         |   `None`    |                                `{}`                                 |    `None`    |       `None`       |  \|  |      149.23       |  \|  |      6.73       |