gorkaartola commited on
Commit
1dccb8f
1 Parent(s): 5f789fe

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -30
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Zero Shot Classifier Tester
3
  emoji: 📊
4
  colorFrom: purple
5
  colorTo: yellow
@@ -9,40 +9,34 @@ app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- # Card for Zero_Shot_Classifier_Tester
13
 
14
  ## Description
15
- With this app you can test and compare different zero shot approaches to classify sentences through Natural Language Inference (NLI) comparison of the sentences to be classified and diferent queries that represent each class, evaluating the *entailment* or *contradition* between two sentences.
16
 
17
  ## How to Use
18
- Please define the configuration to be tested selecting one of the avalable options for *test_dataset*, *metric_selector*, *model_selector*, *queries_selector* and *prompt_selector*, and as many choices of *preditions_strategy_selector*. The available input choices for each parameter for the tests are included in the file [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) or its clone if run the app locally, and can be extended with further posibilities by including them in the same file. To include more choices please insert new keys in the corresponding *dict* included in [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) with a *dict value* non already used in the same *dict*, and complying with the rules described in the *Inputs* chapter of this card for each parameter.
19
 
20
  ### Inputs
21
 
22
- - **metric_selector** *(str or os.PathLike)*: *Evaluation module identifier* on the HuggingFace evaluate repo or local path to the metric script. The metric must include, in addition to the conventinal *predictions* and *references* inputs a *kwarg* named *prediction_strategies (list(str))*. The metric must be able to handle a family of prediction strategies which must be included within the options lists for the parameter *prediction_strategy_selector*. Also the metric defines the minimum characteristics of the data comprised in the *test_dataset* parameter and at least adecuated test dataset must included within the options lists for the parameter *test_dataset*. The app currently includes the following metrics:
23
 
24
- - [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples). This metric is specially designed to measure the performance of sentece classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.
25
-
26
- The *prediction_strategies* implemented in this metric are:
27
- - *argmax*, which takes the highest value of the softmax inference logits to select the prediction.
28
- - *threshold*, which takes all softmax inference logits above a certain value to select the predictions.
29
- - *topk*, which takes the highest *k* softmax inference logits to select the predictions.
30
 
31
- The minimum fields required by this metric for the test datasets are the following:
32
- - *title* containing the first sentence to be compared with diferent queries representing each class.
33
- - *label_ids* containing the *id* of the class the sample refers to. Including samples of all the classes is advised.
34
- - *nli_label* which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the *label_ids* is incorrectly assigned to the *title*. Including both True Positive and False Positive samples for all classes is advised.
35
 
36
- Example:
37
 
38
- |title |label_ids |nli_label |
39
- |-----------------------------------------------------------------------------------|:---------:|:----------:|
40
- |'Together we can save the arctic': celebrity advocacy and the Rio Earth Summit 2012| 8 | 0 |
41
- |Tuple-based semantic and structural mapping for a sustainable interoperability | 16 | 2 |
42
 
43
- - **test_dataset** *(str or os.PathLike)*: the *name* of a dataset hosted inside a repo on huggingface.co or path to a directory containing the dataset locally. The data used for testing will be only the one included in the *test* split of the dataset, contains the samples of the first sentence to be classified by NLI comparison with different queries and the minimum additional data fields for each sample is defined by the requirements of the selected metric among the ones described above.
44
 
45
- - [gorkaartola/SC-ZS-test_AURORA-Gold-SDG_True-Positives-and-False-Positives](https://huggingface.co/datasets/gorkaartola/SC-ZS-test_AURORA-Gold-SDG_True-Positives-and-False-Positives). This dataset is shaped to be used with the metric [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples) and includes both Gold True Positive samples and Gold False Positive samples of titles of scientific papers which are associated to a particular [Sustainable Development Goal of the UN](https://sdgs.un.org/goals) or SDG (True Positives) or are not associated to a particular SDG (False Positives). These assignations are no excluding, being possible that the titles be also related to other SDGs or not related to other SDGs. The data has been hand annotated by experts in the framework of the [AURORA Project](https://sites.google.com/vu.nl/aurora-sdg-research-dashboard/deliverables?authuser=0#h.5lufepxyapac) and published in the paper [Evaluation on accuracy of mapping science to the United Nations' Sustainable Development Goals (SDGs) of the Aurora SDG queries](https://zenodo.org/record/4917171#.YvaH5DqxVH4). The structure of the data includida in the dataset is the following:
46
 
47
  |SDG |True Positive Samples|False Positive Samples |
48
  |:-------:|:-------------------:|:---------------------:|
@@ -65,9 +59,7 @@ Please define the configuration to be tested selecting one of the avalable optio
65
  |17 |29 |29 |
66
  |Total |1.043 |1.043 |
67
 
68
- - **model_selector** *(str or os.PathLike)*: the *model id* of a pretrained model hosted inside a model repo on huggingface.co or path to a directory containing model weights saved using save_pretrained() to models for [Zero-Shot Classification with transformers](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads), fine-tuned models based on them, or in general models trained to perform classification through natural language inference NLI.
69
-
70
- - **queries_selector** *(str)*: combination of the the *name* of a dataset hosted inside a repo on huggingface.co followed by "-" and the name of the file that contains the queries to build the second sentence for the NLI comparison together with a *prompt*. The dataset file must include at least the following fields:
71
  - *query*, containing each sample a sentence related to a certain class. One class may have multiple query sentences in the dataset and the one selected after inference to measure the *entailment* or *contradiction* of the sentences to be classified with each particular class is the one on which the softmax logits of the inference is the highest of all queries associated to that particular class.
72
  - *label_ids*, containing the identification of the class associated to each query.
73
 
@@ -89,7 +81,7 @@ Please define the configuration to be tested selecting one of the avalable optio
89
  |participation of women | 4 |
90
  |women’s rights | 4 |
91
 
92
- For the representation of each query dataset in the file [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py), each dataset *name* is includad as a new *dict_key* containing as value also a *dict* containing the names of diferent .csv files with the queries data included in the dataset as shown below.
93
 
94
  queries = {
95
  'gorkaartola/SDG_queries':
@@ -116,7 +108,21 @@ Please define the configuration to be tested selecting one of the avalable optio
116
  - *None*.
117
  - *'This is '*.
118
  - *'The subject is '* particularly included for the queries of *SDG_targets.csv* file.
119
- - *'The Sustainable Development Goal is '* particlarly included for the queries of the *SDG_Numbers.csv* file.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
 
121
  - **prediction_strategy_selector** *(str)*: identifiers oft he strategies implemented in the corresponding *metric* for the selection of the predictions. The strategy choices currently included are:
122
  + For the [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples) metric:
@@ -125,7 +131,7 @@ Please define the configuration to be tested selecting one of the avalable optio
125
  + *topk*: with valules of 3, 5, 7 and 9. To add a new *topk* strategy a new key can be introduced in the *predition_strategy_options dict* of the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file with a list as follows *["topk", desired value]*.
126
 
127
  ### Output
128
- The output is a .csv file that includes, for every prediction strategy selected, a detailed table of results including, recall, precission, f1-score and accuracy of the predictions for each SDG, and both overall micro and macro averages. If cloned and run in local, this file is saved in the *Reports* folder and a file with the calculated entailment softmax logits for each query is also saved in the *Reports/ZS inference tables* folder. The files included in the Huggingface repo have been uploaded as examples of the results obtained from the app.
129
 
130
  ## Limitations and Bias
131
  Please refer to the limitations and bias of the models use for the inference
@@ -164,9 +170,9 @@ BibLaTex
164
  ## Citation
165
  BibLaTeX
166
  ```
167
- @online{ZS_SDGs,
168
  author = {Gorka Artola},
169
- title = {Testing Zero Shot Classification by SDGs},
170
  year = 2022,
171
  url = {https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs},
172
  urldate = {2022-08-11}
 
1
  ---
2
+ title: Zero Shot Classifier Tester for True Positive and False Positive Samples
3
  emoji: 📊
4
  colorFrom: purple
5
  colorTo: yellow
 
9
  pinned: false
10
  ---
11
 
12
+ # Card for Zero_Shot_Classifier_Tester_for_TP_FP
13
 
14
  ## Description
15
+ With this app you can test and compare different zero shot approaches to classify sentences through Natural Language Inference (NLI) comparison of the sentences to be classified (sentence 1) and sentences (sentence 2) built combining different queries that represent each class that with prompts, evaluating the *entailment* or *contradiction* between sentence 1 and sentence 2, particularly in the case our dataset contains True Positive and False Positive Samples
16
 
17
  ## How to Use
18
+ Please define the configuration to be tested selecting one of the available options for *model_selector*, *test_dataset*, *queries_selector*, *prompt_selector* and *metric_selector*, and as many choices of the available *predictions_strategy_selector*. The available input choices for each parameter for the tests are included in the file [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) or its clone if run the app locally, and can be extended with further possibilities by including them in the same file. To include more choices please insert new keys in the corresponding *dict* included in the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file with a *dict value* non already used in the same *dict*, and complying with the rules described in the *Inputs* chapter of this card for each parameter.
19
 
20
  ### Inputs
21
 
22
+ - **model_selector** *(str or os.PathLike)*: the *model id* of a pretrained model hosted inside a model repo on huggingface.co or path to a directory containing model weights saved using save_pretrained() to models for [Zero-Shot Classification with transformers](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads), fine-tuned models based on them, or in general models trained to perform classification through natural language inference NLI.
23
 
24
+ - **test_dataset** *(str or os.PathLike)*: the *name* of a dataset hosted inside a repo on huggingface.co or path to a directory containing the dataset locally. The data used for testing will be only the one included in the *test* split of the dataset, contains the samples of the first sentence (Sentence 1) to be classified by NLI comparison with different queries and the names of the minimum data fields that each sample must have are the following.
 
 
 
 
 
25
 
26
+ - *title* containing the first sentence to be compared with different queries representing each class.
27
+ - *label_ids* containing the *id* of the class the sample refers to. Including samples of all the classes is advised.
28
+ - *nli_label* which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the *label_ids* is incorrectly assigned to the *title*. Including both True Positive and False Positive samples for all classes is advised.
 
29
 
30
+ Example:
31
 
32
+ |title |label_ids |nli_label |
33
+ |-----------------------------------------------------------------------------------|:---------:|:----------:|
34
+ |'Together we can save the arctic': celebrity advocacy and the Rio Earth Summit 2012| 8 | 0 |
35
+ |Tuple-based semantic and structural mapping for a sustainable interoperability | 16 | 2 |
36
 
37
+ Currently the available dataset options in the app are:
38
 
39
+ - [gorkaartola/SC-ZS-test_AURORA-Gold-SDG_True-Positives-and-False-Positives](https://huggingface.co/datasets/gorkaartola/SC-ZS-test_AURORA-Gold-SDG_True-Positives-and-False-Positives). This dataset is shaped to be used with the metric [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples) and includes both Gold True Positive samples and Gold False Positive samples of titles of scientific papers which are associated to a particular [Sustainable Development Goal of the UN](https://sdgs.un.org/goals) or SDG (True Positives) or are not associated to a particular SDG (False Positives). These assignations are no excluding, being possible that the titles be also related to other SDGs or not related to other SDGs. The data has been hand annotated by experts in the framework of the [AURORA Project](https://sites.google.com/vu.nl/aurora-sdg-research-dashboard/deliverables?authuser=0#h.5lufepxyapac) and published in the paper [Evaluation on accuracy of mapping science to the United Nations' Sustainable Development Goals (SDGs) of the Aurora SDG queries](https://zenodo.org/record/4917171#.YvaH5DqxVH4). The structure of the data included in the dataset is the following:
40
 
41
  |SDG |True Positive Samples|False Positive Samples |
42
  |:-------:|:-------------------:|:---------------------:|
 
59
  |17 |29 |29 |
60
  |Total |1.043 |1.043 |
61
 
62
+ - **queries_selector** *(str)*: combination of the the *name* of a dataset hosted inside a repo on huggingface.co followed by "-" and the name of a csv file that contains the queries to build the second sentence (setnence 2) for the NLI comparison together with a *prompt*. The dataset file must include at least the following fields:
 
 
63
  - *query*, containing each sample a sentence related to a certain class. One class may have multiple query sentences in the dataset and the one selected after inference to measure the *entailment* or *contradiction* of the sentences to be classified with each particular class is the one on which the softmax logits of the inference is the highest of all queries associated to that particular class.
64
  - *label_ids*, containing the identification of the class associated to each query.
65
 
 
81
  |participation of women | 4 |
82
  |women’s rights | 4 |
83
 
84
+ For the representation of each query dataset in the file [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py), each dataset *name* is includad as a new *dict_key* containing as value also a *dict* containing the names of different .csv files with the queries data included in the dataset as shown below.
85
 
86
  queries = {
87
  'gorkaartola/SDG_queries':
 
108
  - *None*.
109
  - *'This is '*.
110
  - *'The subject is '* particularly included for the queries of *SDG_targets.csv* file.
111
+ - *'The Sustainable Development Goal is '* particularly included for the queries of the *SDG_Numbers.csv* file.
112
+
113
+ - **metric_selector** *(str or os.PathLike)*: *Evaluation module identifier* on the HuggingFace evaluate repo or local path to a metric script. The metric must accept the following inputs:
114
+ - *predictions*, *(numpy.array(float32)[sentences to classify,number of classes])*: numpy array with the softmax logits values of the entailment dimension of the NLI inference on the sentences to be classified for each class.
115
+ - *references* , *(numpy.array(int32)[sentences to classify,2]: numpy array with the reference *label_ids* and *nli_label* of the sentences to be classified, given in the *test_dataset*.
116
+ - A *kwarg* named *prediction_strategies*, (list(str))*. The metric must be able to handle a family of prediction strategies which must be included within the options lists for the parameter *prediction_strategy_selector* in the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file.
117
+
118
+ The app currently includes the following metrics:
119
+
120
+ - [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples). This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.
121
+
122
+ The *prediction_strategies* implemented in this metric are:
123
+ - *argmax*, which takes the highest value of the softmax inference logits to select the prediction.
124
+ - *threshold*, which takes all softmax inference logits above a certain value to select the predictions.
125
+ - *topk*, which takes the highest *k* softmax inference logits to select the predictions.
126
 
127
  - **prediction_strategy_selector** *(str)*: identifiers oft he strategies implemented in the corresponding *metric* for the selection of the predictions. The strategy choices currently included are:
128
  + For the [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples) metric:
 
131
  + *topk*: with valules of 3, 5, 7 and 9. To add a new *topk* strategy a new key can be introduced in the *predition_strategy_options dict* of the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file with a list as follows *["topk", desired value]*.
132
 
133
  ### Output
134
+ The output is a .csv file that includes, for every prediction strategy selected, a detailed table of results including, recall, precision, f1-score and accuracy of the predictions for each class, and both overall micro and macro averages. If cloned and run in local, this file is saved in the *Reports* folder and a file with the calculated entailment softmax logits for each query is also saved in the *Reports/ZS inference tables* folder. The files included in the Huggingface repo have been uploaded as examples of the results obtained from the app.
135
 
136
  ## Limitations and Bias
137
  Please refer to the limitations and bias of the models use for the inference
 
170
  ## Citation
171
  BibLaTeX
172
  ```
173
+ @online{ZS_classifier_tester,
174
  author = {Gorka Artola},
175
+ title = {Zero Shot Classifier Tester for True Positive and False Positive Samples},
176
  year = 2022,
177
  url = {https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs},
178
  urldate = {2022-08-11}