File size: 6,093 Bytes
6828a94 3ce8b81 6828a94 3ce8b81 6828a94 3ce8b81 2cc4559 88aa380 3ce8b81 93782e8 3ce8b81 93782e8 3ce8b81 88aa380 3ce8b81 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
---
language: en
tags:
- text-classification
- onnx
- emotions
- multi-class-classification
- multi-label-classification
datasets:
- go_emotions
license: mit
inference: false
widget:
- text: ONNX is so much faster, its very handy!
---
### Overview
This is a multi-label, multi-class linear classifer for emotions that works with [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), having been trained on the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset.
### Labels
The 28 labels from the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset are:
```
['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
```
### Metrics (exact match of labels per item)
This is a multi-label, multi-class dataset, so each label is effectively a separate binary classification. Evaluating across all labels per item in the go_emotions test split the metrics are shown below.
Optimising the threshold per label to optimise the F1 metric, the metrics (evaluated on the go_emotions test split) are:
- Precision: 0.384
- Recall: 0.438
- F1: 0.397
Weighted by the relative support of each label in the dataset, this is:
- Precision: 0.443
- Recall: 0.552
- F1: 0.484
Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label, the metrics (evaluated on the go_emotions test split, and unweighted by support) are:
- Precision: 0.551
- Recall: 0.211
- F1: 0.261
### Metrics (per-label)
This is a multi-label, multi-class dataset, so each label is effectively a separate binary classification and metrics are better measured per label.
Optimising the threshold per label to optimise the F1 metric, the metrics (evaluated on the go_emotions test split) are:
| | f1 | precision | recall | support | threshold |
| -------------- | ----- | --------- | ------ | ------- | --------- |
| admiration | 0.529 | 0.499 | 0.563 | 504 | 0.25 |
| amusement | 0.733 | 0.672 | 0.807 | 264 | 0.20 |
| anger | 0.394 | 0.363 | 0.429 | 198 | 0.15 |
| annoyance | 0.293 | 0.252 | 0.350 | 320 | 0.15 |
| approval | 0.292 | 0.345 | 0.254 | 351 | 0.20 |
| caring | 0.320 | 0.270 | 0.393 | 135 | 0.15 |
| confusion | 0.291 | 0.276 | 0.307 | 153 | 0.15 |
| curiosity | 0.366 | 0.307 | 0.454 | 284 | 0.15 |
| desire | 0.317 | 0.269 | 0.386 | 83 | 0.15 |
| disappointment | 0.159 | 0.127 | 0.212 | 151 | 0.10 |
| disapproval | 0.306 | 0.341 | 0.277 | 267 | 0.20 |
| disgust | 0.405 | 0.412 | 0.398 | 123 | 0.20 |
| embarrassment | 0.364 | 0.414 | 0.324 | 37 | 0.35 |
| excitement | 0.296 | 0.232 | 0.408 | 103 | 0.15 |
| fear | 0.496 | 0.576 | 0.436 | 78 | 0.40 |
| gratitude | 0.793 | 0.787 | 0.798 | 352 | 0.30 |
| grief | 0.323 | 0.200 | 0.833 | 6 | 0.45 |
| joy | 0.402 | 0.341 | 0.491 | 161 | 0.15 |
| love | 0.640 | 0.679 | 0.605 | 238 | 0.30 |
| nervousness | 0.263 | 0.333 | 0.217 | 23 | 0.70 |
| optimism | 0.433 | 0.453 | 0.414 | 186 | 0.20 |
| pride | 0.429 | 0.500 | 0.375 | 16 | 0.50 |
| realization | 0.177 | 0.159 | 0.200 | 145 | 0.10 |
| relief | 0.182 | 0.182 | 0.182 | 11 | 0.40 |
| remorse | 0.541 | 0.500 | 0.589 | 56 | 0.30 |
| sadness | 0.461 | 0.467 | 0.455 | 156 | 0.20 |
| surprise | 0.302 | 0.299 | 0.305 | 141 | 0.15 |
| neutral | 0.620 | 0.505 | 0.803 | 1787 | 0.30 |
The thesholds are stored in `thresholds.json`.
### Use with ONNXRuntime
The input to the model is called `logits`, and there is one output per label. Each output produces a 2d array, with 1 row per input row, and each row having 2 columns - the first being a proba output for the negative case, and the second being a proba output for the positive case.
```python
# Assuming you have embeddings from all-MiniLM-L6-v2 for the input sentences
# E.g. produced from sentence-transformers such as:
# huggingface.co/sentence-transformers/all-MiniLM-L6-v2
# or from an ONNX version E.g. huggingface.co/Xenova/all-MiniLM-L6-v2
print(embeddings.shape) # E.g. a batch of 1 sentence
> (1, 384)
import onnxruntime as ort
sess = ort.InferenceSession("path_to_model_dot_onnx", providers=['CPUExecutionProvider'])
outputs = [o.name for o in sess.get_outputs()] # list of labels, in the order of the outputs
preds_onnx = sess.run(_outputs, {'logits': embeddings})
# preds_onnx is a list with 28 entries, one per label,
# each with a numpy array of shape (1, 2) given the input was a batch of 1
print(outputs[0])
> surprise
print(preds_onnx[0])
> array([[0.97136074, 0.02863926]], dtype=float32)
# load thresholds.json and use that (per label) to convert the positive case score to a binary prediction
```
### Commentary on the dataset
Some labels (E.g. gratitude) when considered independently perform very strongly, whilst others (E.g. relief) perform very poorly.
This is a challenging dataset. Labels such as relief do have much fewer examples in the training data (less than 100 out of the 40k+, and only 11 in the test split).
But there is also some ambiguity and/or labelling errors visible in the training data of go_emotions that is suspected to constrain the performance. Data cleaning on the dataset to reduce some of the mistakes, ambiguity, conflicts and duplication in the labelling would produce a higher performing model. |