File size: 6,100 Bytes
a625c5e 4577f51 a625c5e 4577f51 a625c5e a404282 60a3889 a404282 ed94491 2352879 a404282 ed94491 a404282 ad4a365 ed94491 ad4a365 c95f7cc ad4a365 7b35e24 ad4a365 7b35e24 c95f7cc 7b35e24 ad4a365 7b35e24 ad4a365 7b35e24 2352879 ed94491 ad4a365 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
---
language: en
tags:
- text-classification
- onnx
- bge-small-en-v1.5
- emotions
- multi-class-classification
- multi-label-classification
datasets:
- go_emotions
models:
- BAAI/bge-small-en-v1.5
license: mit
inference: false
widget:
- text: ONNX is so much faster, its very handy!
---
### Overview
This is a multi-label, multi-class linear classifer for emotions that works with [BGE-small-en-v1.5 embeddings](https://huggingface.co/BAAI/bge-small-en-v1.5), having been trained on the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset.
### Labels
The 28 labels from the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset are:
```
['admiration', 'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion', 'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust', 'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy', 'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief', 'remorse', 'sadness', 'surprise', 'neutral']
```
### Metrics (exact match of labels per item)
This is a multi-label, multi-class dataset, so each label is effectively a separate binary classification. Evaluating across all labels per item in the go_emotions test split the metrics are shown below.
Optimising the threshold per label to optimise the F1 metric, the metrics (evaluated on the go_emotions test split) are:
- Precision: 0.445
- Recall: 0.476
- F1: 0.449
Weighted by the relative support of each label in the dataset, this is:
- Precision: 0.472
- Recall: 0.582
- F1: 0.514
Using a fixed threshold of 0.5 to convert the scores to binary predictions for each label, the metrics (evaluated on the go_emotions test split, and unweighted by support) are:
- Precision: 0.602
- Recall: 0.250
- F1: 0.303
### Metrics (per-label)
This is a multi-label, multi-class dataset, so each label is effectively a separate binary classification and metrics are better measured per label.
Optimising the threshold per label to optimise the F1 metric, the metrics (evaluated on the go_emotions test split) are:
| | f1 | precision | recall | support | threshold |
| -------------- | ----- | --------- | ------ | ------- | --------- |
| admiration | 0.583 | 0.574 | 0.593 | 504 | 0.30 |
| amusement | 0.668 | 0.722 | 0.621 | 264 | 0.25 |
| anger | 0.350 | 0.309 | 0.404 | 198 | 0.15 |
| annoyance | 0.299 | 0.318 | 0.281 | 320 | 0.20 |
| approval | 0.338 | 0.281 | 0.425 | 351 | 0.15 |
| caring | 0.321 | 0.323 | 0.319 | 135 | 0.20 |
| confusion | 0.384 | 0.313 | 0.497 | 153 | 0.15 |
| curiosity | 0.467 | 0.432 | 0.507 | 284 | 0.20 |
| desire | 0.426 | 0.381 | 0.482 | 83 | 0.20 |
| disappointment | 0.210 | 0.147 | 0.364 | 151 | 0.10 |
| disapproval | 0.366 | 0.288 | 0.502 | 267 | 0.15 |
| disgust | 0.416 | 0.409 | 0.423 | 123 | 0.20 |
| embarrassment | 0.370 | 0.341 | 0.405 | 37 | 0.30 |
| excitement | 0.313 | 0.368 | 0.272 | 103 | 0.25 |
| fear | 0.615 | 0.677 | 0.564 | 78 | 0.40 |
| gratitude | 0.828 | 0.810 | 0.847 | 352 | 0.25 |
| grief | 0.545 | 0.600 | 0.500 | 6 | 0.85 |
| joy | 0.455 | 0.429 | 0.484 | 161 | 0.20 |
| love | 0.642 | 0.673 | 0.613 | 238 | 0.30 |
| nervousness | 0.350 | 0.412 | 0.304 | 23 | 0.60 |
| optimism | 0.439 | 0.417 | 0.462 | 186 | 0.20 |
| pride | 0.480 | 0.667 | 0.375 | 16 | 0.70 |
| realization | 0.232 | 0.191 | 0.297 | 145 | 0.10 |
| relief | 0.353 | 0.500 | 0.273 | 11 | 0.50 |
| remorse | 0.643 | 0.529 | 0.821 | 56 | 0.20 |
| sadness | 0.526 | 0.497 | 0.558 | 156 | 0.20 |
| surprise | 0.329 | 0.318 | 0.340 | 141 | 0.15 |
| neutral | 0.634 | 0.528 | 0.794 | 1787 | 0.30 |
The thesholds are stored in `thresholds.json`.
### Use with ONNXRuntime
The input to the model is called `logits`, and there is one output per label. Each output produces a 2d array, with 1 row per input row, and each row having 2 columns - the first being a proba output for the negative case, and the second being a proba output for the positive case.
```python
# Assuming you have embeddings from BAAI/bge-small-en-v1.5 for the input sentences
# E.g. produced from sentence-transformers E.g. huggingface.co/BAAI/bge-small-en-v1.5
# or from an ONNX version E.g. huggingface.co/Xenova/bge-small-en-v1.5
print(embeddings.shape) # E.g. a batch of 1 sentence
> (1, 384)
import onnxruntime as ort
sess = ort.InferenceSession("path_to_model_dot_onnx", providers=['CPUExecutionProvider'])
outputs = [o.name for o in sess.get_outputs()] # list of labels, in the order of the outputs
preds_onnx = sess.run(_outputs, {'logits': embeddings})
# preds_onnx is a list with 28 entries, one per label,
# each with a numpy array of shape (1, 2) given the input was a batch of 1
print(outputs[0])
> surprise
print(preds_onnx[0])
> array([[0.97136074, 0.02863926]], dtype=float32)
# load thresholds.json and use that (per label) to convert the positive case score to a binary prediction
```
### Commentary on the dataset
Some labels (E.g. gratitude) when considered independently perform very strongly, whilst others (E.g. relief) perform very poorly.
This is a challenging dataset. Labels such as relief do have much fewer examples in the training data (less than 100 out of the 40k+, and only 11 in the test split).
But there is also some ambiguity and/or labelling errors visible in the training data of go_emotions that is suspected to constrain the performance. Data cleaning on the dataset to reduce some of the mistakes, ambiguity, conflicts and duplication in the labelling would produce a higher performing model. |