Edit model card

task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': 'avx512_vnni'}
Number of evaluation samples: 1000

Fixed parameters:

  • model_name_or_path: elastic/distilbert-base-uncased-finetuned-conll03-english
  • dataset:
    • path: conll2003
    • eval_split: validation
    • data_keys: {'primary': 'tokens'}
    • ref_keys: ['ner_tags']
    • calibration_split: train
  • node_exclusion: []
  • per_channel: False
  • calibration:
    • method: minmax
    • num_calibration_samples: 100
  • framework: onnxruntime
  • framework_args:
    • opset: 11
    • optimization_level: 1
  • aware_training: False

Benchmarked parameters:

  • quantization_approach: dynamic, static
  • operators_to_quantize: ['Add', 'MatMul'], ['Add']

Evaluation

Non-time metrics

quantization_approach operators_to_quantize precision (original) precision (optimized) recall (original) recall (optimized) f1 (original) f1 (optimized) accuracy (original) accuracy (optimized)
dynamic ['Add', 'MatMul'] | 0.937 0.937 | 0.953 0.953 | 0.945 0.945 | 0.988 0.988
dynamic ['Add'] | 0.937 0.937 | 0.953 0.953 | 0.945 0.945 | 0.988 0.988
static ['Add', 'MatMul'] | 0.937 0.074 | 0.953 0.253 | 0.945 0.114 | 0.988 0.363
static ['Add'] | 0.937 0.065 | 0.953 0.186 | 0.945 0.096 | 0.988 0.340

Time metrics

Time benchmarks were run for 3 seconds per config.

Below, time metrics for batch size = 1, input length = 64.

quantization_approach operators_to_quantize latency_mean (original, ms) latency_mean (optimized, ms) throughput (original, /s) throughput (optimized, /s)
dynamic ['Add', 'MatMul'] | 57.64 12.30 | 17.67 81.33
dynamic ['Add'] | 43.51 29.42 | 23.00 34.00
static ['Add', 'MatMul'] | 43.05 21.11 | 23.33 47.67
static ['Add'] | 43.50 37.93 | 23.00 26.67

Below, time metrics for batch size = 4, input length = 64.

quantization_approach operators_to_quantize latency_mean (original, ms) latency_mean (optimized, ms) throughput (original, /s) throughput (optimized, /s)
dynamic ['Add', 'MatMul'] | 119.50 39.92 | 8.67 25.33
dynamic ['Add'] | 119.62 107.42 | 8.67 9.33
static ['Add', 'MatMul'] | 120.23 56.94 | 8.33 17.67
static ['Add'] | 119.10 130.78 | 8.67 7.67

Below, time metrics for batch size = 8, input length = 64.

quantization_approach operators_to_quantize latency_mean (original, ms) latency_mean (optimized, ms) throughput (original, /s) throughput (optimized, /s)
dynamic ['Add', 'MatMul'] | 165.84 75.45 | 6.33 13.33
dynamic ['Add'] | 214.65 211.41 | 4.67 5.00
static ['Add', 'MatMul'] | 166.53 129.00 | 6.33 8.00
static ['Add'] | 214.81 256.95 | 4.67 4.00
Downloads last month
0
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train fxmarty/20220712-h07m20s32_example_conll2003