task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': None}
Number of evaluation samples: All dataset

Fixed parameters:

model_name_or_path: elastic/distilbert-base-uncased-finetuned-conll03-english
dataset:
- path: conll2003
- eval_split: validation
- data_keys: {'primary': 'tokens'}
- ref_keys: ['ner_tags']
- calibration_split: train
per_channel: False
calibration:
- method: minmax
- num_calibration_samples: 100
framework: onnxruntime
framework_args:
- opset: 11
- optimization_level: 1
aware_training: False

Benchmarked parameters:

quantization_approach: dynamic, static
operators_to_quantize: ['Add'], ['Add', 'MatMul']
node_exclusion: [], ['layernorm', 'gelu', 'residual', 'gather', 'softmax']

Evaluation

Non-time metrics

quantization_approach	operators_to_quantize	node_exclusion		precision (original)	precision (optimized)		recall (original)	recall (optimized)		f1 (original)	f1 (optimized)		accuracy (original)	accuracy (optimized)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	0.936	0.934	\|	0.944	0.942	\|	0.940	0.938	\|	0.988	0.988
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	0.936	0.934	\|	0.944	0.942	\|	0.940	0.938	\|	0.988	0.988
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	0.936	0.936	\|	0.944	0.944	\|	0.940	0.940	\|	0.988	0.988
`dynamic`	`['Add']`	`[]`	\|	0.936	0.936	\|	0.944	0.944	\|	0.940	0.940	\|	0.988	0.988
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	0.936	0.904	\|	0.944	0.921	\|	0.940	0.912	\|	0.988	0.984
`static`	`['Add', 'MatMul']`	`[]`	\|	0.936	0.065	\|	0.944	0.243	\|	0.940	0.103	\|	0.988	0.357
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	0.936	0.909	\|	0.944	0.930	\|	0.940	0.919	\|	0.988	0.986
`static`	`['Add']`	`[]`	\|	0.936	0.050	\|	0.944	0.160	\|	0.940	0.076	\|	0.988	0.311

Time metrics

Time benchmarks were run for 15 seconds per config.

Below, time metrics for batch size = 1, input length = 32.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	32.90	7.03	\|	30.40	142.20
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	48.27	7.68	\|	20.73	130.33
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	33.74	14.73	\|	29.67	67.93
`dynamic`	`['Add']`	`[]`	\|	33.49	14.17	\|	29.87	70.60
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	47.72	8.20	\|	21.00	121.93
`static`	`['Add', 'MatMul']`	`[]`	\|	47.87	10.58	\|	20.93	94.60
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	45.77	19.00	\|	21.87	52.67
`static`	`['Add']`	`[]`	\|	44.67	18.77	\|	22.40	53.33

Below, time metrics for batch size = 1, input length = 64.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	59.15	13.60	\|	16.93	73.53
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	44.01	12.60	\|	22.73	79.40
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	60.50	29.87	\|	16.53	33.53
`dynamic`	`['Add']`	`[]`	\|	45.35	24.10	\|	22.07	41.53
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	59.98	16.08	\|	16.73	62.20
`static`	`['Add', 'MatMul']`	`[]`	\|	43.23	19.02	\|	23.20	52.60
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	43.15	32.96	\|	23.20	30.40
`static`	`['Add']`	`[]`	\|	44.01	31.68	\|	22.80	31.60

Below, time metrics for batch size = 1, input length = 128.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	55.20	25.72	\|	18.13	38.93
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	73.52	26.70	\|	13.67	37.47
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	71.60	53.26	\|	14.00	18.80
`dynamic`	`['Add']`	`[]`	\|	70.39	56.68	\|	14.27	17.67
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	71.34	31.75	\|	14.07	31.53
`static`	`['Add', 'MatMul']`	`[]`	\|	73.55	37.95	\|	13.60	26.40
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	70.28	62.70	\|	14.27	16.00
`static`	`['Add']`	`[]`	\|	63.86	61.64	\|	15.67	16.27

Below, time metrics for batch size = 4, input length = 32.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	70.41	22.67	\|	14.27	44.13
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	71.65	21.44	\|	14.00	46.67
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	71.72	55.16	\|	14.00	18.13
`dynamic`	`['Add']`	`[]`	\|	55.56	43.87	\|	18.00	22.80
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	55.45	27.83	\|	18.07	36.00
`static`	`['Add', 'MatMul']`	`[]`	\|	66.57	34.45	\|	15.07	29.07
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	55.23	59.31	\|	18.13	16.87
`static`	`['Add']`	`[]`	\|	58.80	66.03	\|	17.07	15.20

Below, time metrics for batch size = 4, input length = 64.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	117.71	43.93	\|	8.53	22.80
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	90.01	43.27	\|	11.13	23.13
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	94.34	107.02	\|	10.60	9.40
`dynamic`	`['Add']`	`[]`	\|	119.11	82.46	\|	8.40	12.13
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	120.57	54.70	\|	8.33	18.33
`static`	`['Add', 'MatMul']`	`[]`	\|	120.00	57.85	\|	8.40	17.33
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	119.57	92.50	\|	8.40	10.87
`static`	`['Add']`	`[]`	\|	117.35	102.09	\|	8.53	9.80

Below, time metrics for batch size = 4, input length = 128.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	220.69	94.33	\|	4.53	10.67
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	170.04	81.68	\|	5.93	12.27
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	188.59	171.79	\|	5.33	5.87
`dynamic`	`['Add']`	`[]`	\|	219.80	163.62	\|	4.60	6.13
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	220.25	94.05	\|	4.60	10.67
`static`	`['Add', 'MatMul']`	`[]`	\|	222.90	135.06	\|	4.53	7.47
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	177.41	211.89	\|	5.67	4.73
`static`	`['Add']`	`[]`	\|	168.30	201.88	\|	6.00	5.00

Below, time metrics for batch size = 8, input length = 32.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	106.46	42.35	\|	9.47	23.67
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	88.68	43.33	\|	11.33	23.13
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	91.32	92.08	\|	11.00	10.87
`dynamic`	`['Add']`	`[]`	\|	88.33	94.18	\|	11.33	10.67
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	107.47	44.74	\|	9.33	22.40
`static`	`['Add', 'MatMul']`	`[]`	\|	118.39	64.56	\|	8.47	15.53
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	87.05	111.36	\|	11.53	9.00
`static`	`['Add']`	`[]`	\|	116.96	98.82	\|	8.60	10.13

Below, time metrics for batch size = 8, input length = 64.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	165.67	87.71	\|	6.07	11.47
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	214.59	87.88	\|	4.67	11.40
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	216.06	163.75	\|	4.67	6.13
`dynamic`	`['Add']`	`[]`	\|	176.69	209.28	\|	5.67	4.80
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	215.12	86.90	\|	4.67	11.53
`static`	`['Add', 'MatMul']`	`[]`	\|	215.99	130.39	\|	4.67	7.73
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	213.87	224.50	\|	4.73	4.47
`static`	`['Add']`	`[]`	\|	211.16	193.01	\|	4.80	5.20

Below, time metrics for batch size = 8, input length = 128.

quantization_approach	operators_to_quantize	node_exclusion		latency_mean (original, ms)	latency_mean (optimized, ms)		throughput (original, /s)	throughput (optimized, /s)
`dynamic`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	391.16	183.35	\|	2.60	5.47
`dynamic`	`['Add', 'MatMul']`	`[]`	\|	414.42	154.52	\|	2.47	6.53
`dynamic`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	314.12	323.94	\|	3.20	3.13
`dynamic`	`['Add']`	`[]`	\|	408.15	325.03	\|	2.47	3.13
`static`	`['Add', 'MatMul']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	337.57	205.59	\|	3.00	4.87
`static`	`['Add', 'MatMul']`	`[]`	\|	375.10	225.09	\|	2.67	4.47
`static`	`['Add']`	`['layernorm', 'gelu', 'residual', 'gather', 'softmax']`	\|	409.68	493.00	\|	2.47	2.07
`static`	`['Add']`	`[]`	\|	397.28	397.74	\|	2.53	2.53

fxmarty
/

20220713-h08m19s38_example_conll2003

Evaluation

Non-time metrics

Time metrics

Dataset used to train fxmarty/20220713-h08m19s38_example_conll2003