Edit model card

task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.m5.2xlarge', 'supported_instructions': 'avx512'}
Number of evaluation samples: 100

Fixed parameters:

  • model_name_or_path: elastic/distilbert-base-uncased-finetuned-conll03-english
  • dataset:
    • path: conll2003
    • eval_split: validation
    • data_keys: {'primary': 'tokens'}
    • ref_keys: ['ner_tags']
    • calibration_split: train
  • node_exclusion: []
  • per_channel: False
  • calibration:
    • method: minmax
    • num_calibration_samples: 100
  • framework: onnxruntime
  • framework_args:
    • opset: 11
    • optimization_level: 1
  • aware_training: False

Benchmarked parameters:

  • quantization_approach: dynamic, static
  • operators_to_quantize: ['Add', 'MatMul'], ['Add']

Evaluation

Non-time metrics

quantization_approach operators_to_quantize precision (original) precision (optimized) recall (original) recall (optimized) f1 (original) f1 (optimized) accuracy (original) accuracy (optimized)
dynamic ['Add', 'MatMul'] | 0.974 0.974 | 0.955 0.949 | 0.964 0.962 | 0.990 0.989
dynamic ['Add'] | 0.974 0.974 | 0.955 0.955 | 0.964 0.964 | 0.990 0.990
static ['Add', 'MatMul'] | 0.974 0.081 | 0.955 0.222 | 0.964 0.118 | 0.990 0.467
static ['Add'] | 0.974 0.073 | 0.955 0.182 | 0.964 0.105 | 0.990 0.290

Time metrics

Time benchmarks were run for 3 seconds per config.

Below, time metrics for batch size = 1, input length = 64.

quantization_approach operators_to_quantize latency_mean (original, ms) latency_mean (optimized, ms) throughput (original, /s) throughput (optimized, /s)
dynamic ['Add', 'MatMul'] | 59.35 21.91 | 17.00 45.67
dynamic ['Add'] | 59.18 29.24 | 17.00 34.33
static ['Add', 'MatMul'] | 59.25 28.31 | 17.00 35.33
static ['Add'] | 58.77 31.80 | 17.33 31.67
Downloads last month
0
Inference Examples
Unable to determine this model's library. Check the docs .

Dataset used to train fxmarty/20220711-h10m00s56_example_conll2003