--- title: Zero Shot Classifier Tester for True Positive and False Positive Samples emoji: 📊 colorFrom: purple colorTo: yellow sdk: gradio sdk_version: 3.1.4 app_file: app.py pinned: false --- # Card for Zero_Shot_Classifier_Tester_for_TP_FP ## Description With this app you can test and compare different zero shot approaches to classify sentences through Natural Language Inference (NLI) comparison of the sentences to be classified (sentence 1) and sentences (sentence 2) built combining different queries that represent each class that with prompts, evaluating the *entailment* or *contradiction* between sentence 1 and sentence 2, particularly in the case our dataset contains True Positive and False Positive Samples ## How to Use Please define the configuration to be tested selecting one of the available options for *model_selector*, *test_dataset*, *queries_selector*, *prompt_selector* and *metric_selector*, and as many choices of the available *predictions_strategy_selector*. The available input choices for each parameter for the tests are included in the file [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) or its clone if run the app locally, and can be extended with further possibilities by including them in the same file. To include more choices please insert new keys in the corresponding *dict* included in the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file with a *dict value* non already used in the same *dict*, and complying with the rules described in the *Inputs* chapter of this card for each parameter. ### Inputs - **model_selector** *(str or os.PathLike)*: the *model id* of a pretrained model hosted inside a model repo on huggingface.co or path to a directory containing model weights saved using save_pretrained() to models for [Zero-Shot Classification with transformers](https://huggingface.co/models?pipeline_tag=zero-shot-classification&sort=downloads), fine-tuned models based on them, or in general models trained to perform classification through natural language inference NLI. - **test_dataset** *(str or os.PathLike)*: the *name* of a dataset hosted inside a repo on huggingface.co or path to a directory containing the dataset locally. The data used for testing will be only the one included in the *test* split of the dataset, contains the samples of the first sentence (Sentence 1) to be classified by NLI comparison with different queries and the names of the minimum data fields that each sample must have are the following. - *title* containing the first sentence to be compared with different queries representing each class. - *label_ids* containing the *id* of the class the sample refers to. Including samples of all the classes is advised. - *nli_label* which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the *label_ids* is incorrectly assigned to the *title*. Including both True Positive and False Positive samples for all classes is advised. Example: |title |label_ids |nli_label | |-----------------------------------------------------------------------------------|:---------:|:----------:| |'Together we can save the arctic': celebrity advocacy and the Rio Earth Summit 2012| 8 | 0 | |Tuple-based semantic and structural mapping for a sustainable interoperability | 16 | 2 | Currently the available dataset options in the app are: - [gorkaartola/SC-ZS-test_AURORA-Gold-SDG_True-Positives-and-False-Positives](https://huggingface.co/datasets/gorkaartola/SC-ZS-test_AURORA-Gold-SDG_True-Positives-and-False-Positives). This dataset is shaped to be used with the metric [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples) and includes both Gold True Positive samples and Gold False Positive samples of titles of scientific papers which are associated to a particular [Sustainable Development Goal of the UN](https://sdgs.un.org/goals) or SDG (True Positives) or are not associated to a particular SDG (False Positives). These assignations are no excluding, being possible that the titles be also related to other SDGs or not related to other SDGs. The data has been hand annotated by experts in the framework of the [AURORA Project](https://sites.google.com/vu.nl/aurora-sdg-research-dashboard/deliverables?authuser=0#h.5lufepxyapac) and published in the paper [Evaluation on accuracy of mapping science to the United Nations' Sustainable Development Goals (SDGs) of the Aurora SDG queries](https://zenodo.org/record/4917171#.YvaH5DqxVH4). The structure of the data included in the dataset is the following: |SDG |True Positive Samples|False Positive Samples | |:-------:|:-------------------:|:---------------------:| |1 |2 |2 | |2 |30 |30 | |3 |288 |288 | |4 |87 |87 | |5 |94 |94 | |6 |62 |62 | |7 |63 |63 | |8 |17 |17 | |9 |65 |65 | |10 |31 |31 | |11 |57 |57 | |12 |48 |48 | |13 |36 |36 | |14 |17 |17 | |15 |77 |77 | |16 |40 |40 | |17 |29 |29 | |Total |1.043 |1.043 | - **queries_selector** *(str)*: combination of the the *name* of a dataset hosted inside a repo on huggingface.co followed by "-" and the name of a csv file that contains the queries to build the second sentence (setnence 2) for the NLI comparison together with a *prompt*. The dataset file must include at least the following fields: - *query*, containing each sample a sentence related to a certain class. One class may have multiple query sentences in the dataset and the one selected after inference to measure the *entailment* or *contradiction* of the sentences to be classified with each particular class is the one on which the softmax logits of the inference is the highest of all queries associated to that particular class. - *label_ids*, containing the identification of the class associated to each query. Example: |query |label_ids| |-----------------------------------------------------------------------------------|:-------:| |poverty | 0 | |poverty mitigation | 0 | |food shortage | 1 | |sustainable agriculture | 1 | |food security | 1 | |hunger | 1 | |public health | 2 | |health | 2 | |education | 3 | |right to education | 3 | |gender equality | 4 | |participation of women | 4 | |women’s rights | 4 | For the representation of each query dataset in the file [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py), each dataset *name* is includad as a new *dict_key* containing as value also a *dict* containing the names of different .csv files with the queries data included in the dataset as shown below. queries = { 'gorkaartola/SDG_queries': { 'SDG_Titles.csv' : '0', 'SDG_Headlines.csv' : '1', 'SDG_Subjects.csv' : '2', 'SDG_Targets.csv' : '3', 'SDG_Numbers.csv' : '4', } } Current query datasets and files are: - *gorkaartola/SDG_queries*: dataset containing different files with queries describing the UN SDGs taken from the [UN Sustainable Development Goals Taxonomy](http://metadata.un.org/sdg/?lang=en). - *SDG_titles.csv*: uses the title of each SDG as queries. - *SDG_Headlines.csv*: uses the headline of each SDG as queries. - *SDG_Subjects.csv*: uses each subject identified in each SDG as queries, making several queries for each SDG. - *SDG_Targets.csv*: uses each target comprised in each SDG as queries, making several queries for each SDG. - *SDG_Numbers.csv*: uses *SDG + number* as query for each SDG. - **prompt_selector** *(str)*: prompts that will be added to each query as a prefix to build the second sentence for NLI comparison, and being the first one each sentence to be classified from the test dataset. Current prompts are: - *None*. - *'This is '*. - *'The subject is '* particularly included for the queries of *SDG_targets.csv* file. - *'The Sustainable Development Goal is '* particularly included for the queries of the *SDG_Numbers.csv* file. - **metric_selector** *(str or os.PathLike)*: *Evaluation module identifier* on the HuggingFace evaluate repo or local path to a metric script. The metric must accept the following inputs: - *predictions*, *(numpy.array(float32)[sentences to classify,number of classes])*: numpy array with the softmax logits values of the entailment dimension of the NLI inference on the sentences to be classified for each class. - *references* , *(numpy.array(int32)[sentences to classify,2]: numpy array with the reference *label_ids* and *nli_label* of the sentences to be classified, given in the *test_dataset*. - A *kwarg* named *prediction_strategies*, (list(str))*. The metric must be able to handle a family of prediction strategies which must be included within the options lists for the parameter *prediction_strategy_selector* in the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file. The app currently includes the following metrics: - [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples). This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned. The *prediction_strategies* implemented in this metric are: - *argmax*, which takes the highest value of the softmax inference logits to select the prediction. - *threshold*, which takes all softmax inference logits above a certain value to select the predictions. - *topk*, which takes the highest *k* softmax inference logits to select the predictions. - **prediction_strategy_selector** *(str)*: identifiers oft he strategies implemented in the corresponding *metric* for the selection of the predictions. The strategy choices currently included are: + For the [gorkaartola/metric_for_tp_fp_samples](https://huggingface.co/spaces/gorkaartola/metric_for_tp_fp_samples) metric: + *argmax* + *threshold*: with values of 0.05, 0.25, 0.5 and 0.75. To add a new *threshold* strategy a new key can be introduced in the *predition_strategy_options dict* of the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file with a list as follows *["threshold", desired value]*. + *topk*: with valules of 3, 5, 7 and 9. To add a new *topk* strategy a new key can be introduced in the *predition_strategy_options dict* of the [options.py](https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs/blob/main/options.py) file with a list as follows *["topk", desired value]*. ### Output The output is a .csv file that includes, for every prediction strategy selected, a detailed table of results including, recall, precision, f1-score and accuracy of the predictions for each class, and both overall micro and macro averages. If cloned and run in local, this file is saved in the *Reports* folder and a file with the calculated entailment softmax logits for each query is also saved in the *Reports/ZS inference tables* folder. The files included in the Huggingface repo have been uploaded as examples of the results obtained from the app. ## Limitations and Bias Please refer to the limitations and bias of the models use for the inference ## References BibLaTex ``` @misc{schmidt_felix_2021_4964606, author = {Schmidt, Felix and Vanderfeesten, Maurice}, title = {{Evaluation on accuracy of mapping science to the United Nations' Sustainable Development Goals (SDGs) of the Aurora SDG queries}}, month = jun, year = 2021, note = {{

Funded by European Commission, Project ID: 101004013, Call: EAC-A02-2019-1, Programme: EPLUS2020, DG/Agency: EACEA

[ Project website | Zenodo Community | Github ]

}}, publisher = {Zenodo}, version = {v1.0.2}, doi = {10.5281/zenodo.4964606}, url = {https://doi.org/10.5281/zenodo.4964606} } ``` ## Citation BibLaTeX ``` @online{ZS_classifier_tester, author = {Gorka Artola}, title = {Zero Shot Classifier Tester for True Positive and False Positive Samples}, year = 2022, url = {https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs}, urldate = {2022-08-11} } ```