task: image-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': None}
Number of evaluation samples: All dataset

Fixed parameters:

model_name_or_path: nateraw/vit-base-beans
dataset:
- path: beans
- eval_split: validation
- data_keys: {'primary': 'image'}
- ref_keys: ['labels']
quantization_approach: dynamic
node_exclusion: []
framework: onnxruntime
framework_args:
- opset: 11
- optimization_level: 1
aware_training: False

Benchmarked parameters:

operators_to_quantize: ['Add', 'MatMul'], ['Add'], []
per_channel: False, True

Evaluation

Non-time metrics

operators_to_quantize	per_channel		accuracy (original)	accuracy (optimized)
`['Add', 'MatMul']`	`False`	\|	0.980	0.980
`['Add', 'MatMul']`	`True`	\|	0.980	0.980
`['Add']`	`False`	\|	0.980	0.980
`['Add']`	`True`	\|	0.980	0.980
`[]`	`False`	\|	0.980	0.980
`[]`	`True`	\|	0.980	0.980

Time metrics

Time benchmarks were run for 15 seconds per config.

Below, time metrics for batch size = 1, input length = 32.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	201.25	70.30	\|	5.00	14.27
`['Add', 'MatMul']`	`True`	\|	203.52	72.48	\|	4.93	13.80
`['Add']`	`False`	\|	166.03	150.93	\|	6.07	6.67
`['Add']`	`True`	\|	200.82	163.17	\|	5.00	6.13
`[]`	`False`	\|	190.99	162.06	\|	5.27	6.20
`[]`	`True`	\|	155.15	162.52	\|	6.47	6.20

Below, time metrics for batch size = 1, input length = 64.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	165.85	70.60	\|	6.07	14.20
`['Add', 'MatMul']`	`True`	\|	161.41	72.71	\|	6.20	13.80
`['Add']`	`False`	\|	200.45	129.40	\|	5.00	7.73
`['Add']`	`True`	\|	154.68	136.42	\|	6.47	7.40
`[]`	`False`	\|	166.97	162.15	\|	6.00	6.20
`[]`	`True`	\|	166.32	162.81	\|	6.07	6.20

Below, time metrics for batch size = 1, input length = 128.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	199.48	70.98	\|	5.07	14.13
`['Add', 'MatMul']`	`True`	\|	199.65	71.78	\|	5.07	13.93
`['Add']`	`False`	\|	199.08	137.97	\|	5.07	7.27
`['Add']`	`True`	\|	189.93	162.45	\|	5.33	6.20
`[]`	`False`	\|	191.63	162.54	\|	5.27	6.20
`[]`	`True`	\|	200.38	162.55	\|	5.00	6.20

Below, time metrics for batch size = 4, input length = 32.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	655.84	243.33	\|	1.53	4.13
`['Add', 'MatMul']`	`True`	\|	661.27	221.16	\|	1.53	4.53
`['Add']`	`False`	\|	662.84	529.28	\|	1.53	1.93
`['Add']`	`True`	\|	512.47	470.66	\|	2.00	2.13
`[]`	`False`	\|	562.81	501.77	\|	1.80	2.00
`[]`	`True`	\|	505.81	521.20	\|	2.00	1.93

Below, time metrics for batch size = 4, input length = 64.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	654.58	258.54	\|	1.53	3.93
`['Add', 'MatMul']`	`True`	\|	617.44	234.05	\|	1.67	4.33
`['Add']`	`False`	\|	661.51	478.81	\|	1.53	2.13
`['Add']`	`True`	\|	657.01	660.23	\|	1.53	1.53
`[]`	`False`	\|	661.64	474.28	\|	1.53	2.13
`[]`	`True`	\|	661.29	471.09	\|	1.53	2.13

Below, time metrics for batch size = 4, input length = 128.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	654.80	219.38	\|	1.53	4.60
`['Add', 'MatMul']`	`True`	\|	663.50	222.37	\|	1.53	4.53
`['Add']`	`False`	\|	625.56	529.02	\|	1.60	1.93
`['Add']`	`True`	\|	655.08	499.41	\|	1.53	2.07
`[]`	`False`	\|	655.92	473.01	\|	1.53	2.13
`[]`	`True`	\|	505.54	659.92	\|	2.00	1.53

Below, time metrics for batch size = 8, input length = 32.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	968.83	443.80	\|	1.07	2.27
`['Add', 'MatMul']`	`True`	\|	1255.70	489.55	\|	0.80	2.07
`['Add']`	`False`	\|	1301.35	938.14	\|	0.80	1.07
`['Add']`	`True`	\|	1279.54	931.91	\|	0.80	1.13
`[]`	`False`	\|	1292.66	1318.07	\|	0.80	0.80
`[]`	`True`	\|	1290.35	1314.74	\|	0.80	0.80

Below, time metrics for batch size = 8, input length = 64.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	1305.45	438.06	\|	0.80	2.33
`['Add', 'MatMul']`	`True`	\|	1296.68	450.40	\|	0.80	2.27
`['Add']`	`False`	\|	968.21	949.81	\|	1.07	1.07
`['Add']`	`True`	\|	1012.35	1317.46	\|	1.00	0.80
`[]`	`False`	\|	1213.91	961.79	\|	0.87	1.07
`[]`	`True`	\|	956.39	945.41	\|	1.07	1.07

Below, time metrics for batch size = 8, input length = 128.

operators_to_quantize	per_channel		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`['Add', 'MatMul']`	`False`	\|	1120.12	497.17	\|	0.93	2.07
`['Add', 'MatMul']`	`True`	\|	1289.50	443.46	\|	0.80	2.27
`['Add']`	`False`	\|	1294.65	930.97	\|	0.80	1.13
`['Add']`	`True`	\|	1181.21	933.82	\|	0.87	1.13
`[]`	`False`	\|	1245.61	1318.07	\|	0.87	0.80
`[]`	`True`	\|	1285.81	1318.82	\|	0.80	0.80

fxmarty
/

20220712-h08m05s32_

Evaluation

Non-time metrics

Time metrics

Dataset used to train fxmarty/20220712-h08m05s32_