gorkaartola commited on
Commit
012d4cb
1 Parent(s): 297dc8d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -20
README.md CHANGED
@@ -18,14 +18,22 @@ pinned: false
18
  This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.
19
 
20
  ## How to Use
21
- In addition to the conventional *predictions* and *references* inputs, this metric includes a *kwarg* named *prediction_strategies (list(str))*, that refer to a family of prediction strategies that the metric can handle.
 
 
 
 
 
 
 
 
22
 
23
  The *prediction_strategies* implemented in this metric are:
24
  - *argmax*, which takes the highest value of the softmax inference logits to select the prediction.
25
  - *threshold*, which takes all softmax inference logits above a certain value to select the predictions.
26
  - *topk*, which takes the highest *k* softmax inference logits to select the predictions.
27
 
28
- The minimum fields required by this metric for the test datasets are the following:
29
  - *title* containing the first sentence to be compared with different queries representing each class.
30
  - *label_ids* containing the *id* of the class the sample refers to. Including samples of all the classes is advised.
31
  - *nli_label* which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the *label_ids* is incorrectly assigned to the *title*. Including both True Positive and False Positive samples for all classes is advised.
@@ -39,27 +47,25 @@ The *prediction_strategies* implemented in this metric are:
39
 
40
  ### Inputs
41
 
42
- - *predictions*, *(numpy.array(float32)[sentences to classify,number of classes])*: numpy array with the softmax logits values of the entailment dimension of the inference on the sentences to be classified for each class.
43
  - *references* , *(numpy.array(int32)[sentences to classify,2]: numpy array with the reference *label_ids* and *nli_label* of the sentences to be classified, given in the *test_dataset*.
44
- - A *kwarg* named *prediction_strategies*, (list(str))*.f prediction strategies which must be included within the options lists for the parameter *prediction_strategy_selector* in the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file.
 
 
 
45
 
46
  ### Output Values
47
 
48
- *Explain what this metric outputs and provide an example of what the metric output looks like. Modules should return a dictionary with one or multiple key-value pairs, e.g. {"bleu" : 6.02}*
49
-
50
- *State the range of possible values that the metric's output can take, as well as what in that range is considered good. For example: "This metric can take on any value between 0 and 100, inclusive. Higher scores are better."*
51
-
52
- #### Values from Popular Papers
53
- *Give examples, preferrably with links to leaderboards or publications, to papers that have reported this metric, along with the values they have reported.*
54
-
55
- ### Examples
56
- *Give code examples of the metric being used. Try to include examples that clear up any potential ambiguity left from the metric description above. If possible, provide a range of examples that show both typical and atypical results, as well as examples where a variety of input parameters are passed.*
57
-
58
- ## Limitations and Bias
59
- *Note any known limitations or biases that the metric has, with links and references if possible.*
60
 
61
  ## Citation
62
- *Cite the source where this metric was introduced.*
63
-
64
- ## Further References
65
- *Add any useful further references.*
 
 
 
 
 
 
 
18
  This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.
19
 
20
  ## How to Use
21
+ In addition to the classical *predictions* and *references* inputs, this metric includes a *kwarg* named *prediction_strategies (list(str))*, that refer to a family of prediction strategies that the metric can handle.
22
+
23
+ Add *predictions*, *references* and *prediction_strategies* as follows
24
+
25
+ ```
26
+ metric = evaluate.load(metric_selector)
27
+ metric.add_batch(predictions = predictions, references = references)
28
+ results = metric.compute(prediction_strategies = prediction_strategies)
29
+ ```
30
 
31
  The *prediction_strategies* implemented in this metric are:
32
  - *argmax*, which takes the highest value of the softmax inference logits to select the prediction.
33
  - *threshold*, which takes all softmax inference logits above a certain value to select the predictions.
34
  - *topk*, which takes the highest *k* softmax inference logits to select the predictions.
35
 
36
+ The minimum fields required by this metric for the test datasets are the following (not necessarily with these names):
37
  - *title* containing the first sentence to be compared with different queries representing each class.
38
  - *label_ids* containing the *id* of the class the sample refers to. Including samples of all the classes is advised.
39
  - *nli_label* which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the *label_ids* is incorrectly assigned to the *title*. Including both True Positive and False Positive samples for all classes is advised.
 
47
 
48
  ### Inputs
49
 
50
+ - *predictions*, *(numpy.array(float32)[sentences to classify,number of classes])*: numpy array with the softmax logits values of the entailment dimension of the NLI inference on the sentences to be classified for each class.
51
  - *references* , *(numpy.array(int32)[sentences to classify,2]: numpy array with the reference *label_ids* and *nli_label* of the sentences to be classified, given in the *test_dataset*.
52
+ - *kwarg* named *prediction_strategies = list(list(str, int(optional)))*, each *list(list(str, int(optional)))* describing a desired prediction strategy as follows:
53
+ + *argmax*: *["argmax"]*.
54
+ + *threshold*: *["threshold", desired value]*.
55
+ + *topk*: ["topk", desired value]*.
56
 
57
  ### Output Values
58
 
59
+ - *dict*, with the names of the used *prediction_strategies* as keys and a *pandas.DataFrame* with a detailed table of metrics including, recall, precision, f1-score and accuracy of the predictions for each class, and both overall micro and macro averages.
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  ## Citation
62
+ BibLaTeX
63
+ ```
64
+ @online{TP_FP_metric,
65
+ author = {Gorka Artola},
66
+ title = {Testing Zero Shot Classification by SDGs},
67
+ year = 2022,
68
+ url = {https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples},
69
+ urldate = {2022-08-11}
70
+ }
71
+ ```