fxmarty's picture
add experience
d4ce69e
metadata
pipeline_tag: token-classification
datasets:
  - conll2003
metrics:
  - precision
  - recall
  - f1
  - accuracy
tags:
  - distilbert

task: token-classification
Backend: sagemaker-training
Backend args: {'instance_type': 'ml.g4dn.2xlarge', 'supported_instructions': None}
Number of evaluation samples: All dataset

Fixed parameters:

  • model_name_or_path: elastic/distilbert-base-uncased-finetuned-conll03-english
  • dataset:
    • path: conll2003
    • eval_split: validation
    • data_keys: {'primary': 'tokens'}
    • ref_keys: ['ner_tags']
    • calibration_split: train
  • quantization_approach: static
  • operators_to_quantize: ['Add', 'MatMul']
  • per_channel: False
  • calibration:
    • method: minmax
    • num_calibration_samples: 100
  • framework: onnxruntime
  • framework_args:
    • opset: 11
    • optimization_level: 1
  • aware_training: False

Benchmarked parameters:

  • node_exclusion: [], ['layernorm', 'gelu', 'residual', 'gather', 'softmax']

Evaluation

Non-time metrics

node_exclusion precision (original) precision (optimized) recall (original) recall (optimized) f1 (original) f1 (optimized) accuracy (original) accuracy (optimized)
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] | 0.936 0.904 | 0.944 0.921 | 0.940 0.912 | 0.988 0.984
[] | 0.936 0.065 | 0.944 0.243 | 0.940 0.103 | 0.988 0.357

Time metrics

Time benchmarks were run for 15 seconds per config.

Below, time metrics for batch size = 4, input length = 64.

node_exclusion latency_mean (original, ms) latency_mean (optimized, ms) throughput (original, /s) throughput (optimized, /s)
['layernorm', 'gelu', 'residual', 'gather', 'softmax'] | 103.46 53.77 | 9.67 18.60
[] | 90.62 65.86 | 11.07 15.20