File size: 4,198 Bytes
5666a14
3362ca5
 
 
 
 
 
297dc8d
5666a14
3362ca5
5666a14
 
 
 
3362ca5
 
 
297dc8d
3362ca5
 
012d4cb
 
1480616
7e0ba0e
012d4cb
 
 
7e0ba0e
297dc8d
7e0ba0e
de496d5
 
 
297dc8d
230967b
 
 
 
 
3362ca5
 
297dc8d
de496d5
 
 
 
 
 
 
 
7e0ba0e
 
 
 
 
3362ca5
 
 
012d4cb
3362ca5
 
012d4cb
 
 
 
2f51751
012d4cb
 
 
 
7e0ba0e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
title: metric_for_TP_FP_samples
datasets:
-  
tags:
- evaluate
- metric
description: This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.
sdk: gradio
sdk_version: 3.0.2
app_file: app.py
pinned: false
---

# Metric Card for metric_for_TP_FP_samples

## Metric Description
This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.

## How to Use
In addition to the classical *predictions* and *references* inputs, this metric includes a *kwarg* named *prediction_strategies (list(str))*, that refer to a family of prediction strategies that the metric can handle.

Add *predictions*, *references* and *prediction_strategies* as follows:
```
	metric = evaluate.load(metric_selector)
	metric.add_batch(predictions = predictions, references = references)
	results = metric.compute(prediction_strategies = prediction_strategies)
```

The minimum fields required by this metric for the test datasets are the following (not necessarily with these names):
- *title* containing the first sentence to be compared with different queries representing each class. 
- *label_ids* containing the *id* of the class the sample refers to. Including samples of all the classes is advised.
- *nli_label* which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the *label_ids* is incorrectly assigned to the *title*. Including both True Positive and False Positive samples for all classes is advised.
		
Example:
|title                                                                              |label_ids  |nli_label   |
|-----------------------------------------------------------------------------------|:---------:|:----------:|
|'Together we can save the arctic': celebrity advocacy and the Rio Earth Summit 2012|     8     |     0      |
|Tuple-based semantic and structural mapping for a sustainable interoperability     |     16    |     2      |

### Inputs

- *predictions*, *(numpy.array(float32)[sentences to classify,number of classes])*: numpy array with the softmax logits values of the entailment dimension of the NLI inference on the sentences to be classified for each class.
- *references* , *(numpy.array(int32)[sentences to classify,2]: numpy array with the reference *label_ids* and *nli_label* of the sentences to be classified, given in the *test_dataset*.
- *kwarg* named *prediction_strategies = list(list(str, int(optional)))*, each *list(list(str, int(optional)))* describing a desired prediction strategy. The *prediction_strategies* implemented in this metric are:
	- *argmax*, which takes the highest value of the softmax inference logits to select the prediction. Syntax: *["argmax_max"]*
	- *threshold*, which takes all softmax inference logits above a certain value to select the predictions. Syntax: *["threshold", desired value]*
	- *topk*, which takes the highest *k* softmax inference logits to select the predictions. Syntax: *["topk", desired value]*

	Example:

```
	prediction_strategies = [['argmax_max'],['threshold', 0.5],['topk,3']] 
```
		

### Output Values

- *dict*, with the names of the used *prediction_strategies* as keys and a *pandas.DataFrame* with a detailed table of metrics including, recall, precision, f1-score and accuracy of the predictions for each class, and both overall micro and macro averages.

## Citation
BibLaTeX
```
@online{TP_FP_metric,
  author = {Gorka Artola},
  title = {Metric for True Positive and False Positive Samples},
  year = 2022,
  url = {https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples},
  urldate = {2022-08-11}
}
```