gorkaartola's picture
Upload README.md
1dccb8f
metadata
title: Zero Shot Classifier Tester for True Positive and False Positive Samples
emoji: 📊
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 3.1.4
app_file: app.py
pinned: false

Card for Zero_Shot_Classifier_Tester_for_TP_FP

Description

With this app you can test and compare different zero shot approaches to classify sentences through Natural Language Inference (NLI) comparison of the sentences to be classified (sentence 1) and sentences (sentence 2) built combining different queries that represent each class that with prompts, evaluating the entailment or contradiction between sentence 1 and sentence 2, particularly in the case our dataset contains True Positive and False Positive Samples

How to Use

Please define the configuration to be tested selecting one of the available options for model_selector, test_dataset, queries_selector, prompt_selector and metric_selector, and as many choices of the available predictions_strategy_selector. The available input choices for each parameter for the tests are included in the file options.py or its clone if run the app locally, and can be extended with further possibilities by including them in the same file. To include more choices please insert new keys in the corresponding dict included in the options.py file with a dict value non already used in the same dict, and complying with the rules described in the Inputs chapter of this card for each parameter.

Inputs

  • model_selector (str or os.PathLike): the model id of a pretrained model hosted inside a model repo on huggingface.co or path to a directory containing model weights saved using save_pretrained() to models for Zero-Shot Classification with transformers, fine-tuned models based on them, or in general models trained to perform classification through natural language inference NLI.

  • test_dataset (str or os.PathLike): the name of a dataset hosted inside a repo on huggingface.co or path to a directory containing the dataset locally. The data used for testing will be only the one included in the test split of the dataset, contains the samples of the first sentence (Sentence 1) to be classified by NLI comparison with different queries and the names of the minimum data fields that each sample must have are the following.

    • title containing the first sentence to be compared with different queries representing each class.
    • label_ids containing the id of the class the sample refers to. Including samples of all the classes is advised.
    • nli_label which is '0' if the sample represents a True Positive or '2' if the sample represents a False Positive, meaning that the label_ids is incorrectly assigned to the title. Including both True Positive and False Positive samples for all classes is advised.

    Example:

    title label_ids nli_label
    'Together we can save the arctic': celebrity advocacy and the Rio Earth Summit 2012 8 0
    Tuple-based semantic and structural mapping for a sustainable interoperability 16 2

    Currently the available dataset options in the app are:

    SDG True Positive Samples False Positive Samples
    1 2 2
    2 30 30
    3 288 288
    4 87 87
    5 94 94
    6 62 62
    7 63 63
    8 17 17
    9 65 65
    10 31 31
    11 57 57
    12 48 48
    13 36 36
    14 17 17
    15 77 77
    16 40 40
    17 29 29
    Total 1.043 1.043
  • queries_selector (str): combination of the the name of a dataset hosted inside a repo on huggingface.co followed by "-" and the name of a csv file that contains the queries to build the second sentence (setnence 2) for the NLI comparison together with a prompt. The dataset file must include at least the following fields:

    • query, containing each sample a sentence related to a certain class. One class may have multiple query sentences in the dataset and the one selected after inference to measure the entailment or contradiction of the sentences to be classified with each particular class is the one on which the softmax logits of the inference is the highest of all queries associated to that particular class.
    • label_ids, containing the identification of the class associated to each query.

    Example:

    query label_ids
    poverty 0
    poverty mitigation 0
    food shortage 1
    sustainable agriculture 1
    food security 1
    hunger 1
    public health 2
    health 2
    education 3
    right to education 3
    gender equality 4
    participation of women 4
    women’s rights 4

    For the representation of each query dataset in the file options.py, each dataset name is includad as a new dict_key containing as value also a dict containing the names of different .csv files with the queries data included in the dataset as shown below.

          queries = {
              'gorkaartola/SDG_queries':
                  {
                      'SDG_Titles.csv' : '0',
                      'SDG_Headlines.csv' : '1',
                      'SDG_Subjects.csv' : '2',
                      'SDG_Targets.csv' : '3',
                      'SDG_Numbers.csv' : '4',
                  }
              }
    

    Current query datasets and files are:

    • gorkaartola/SDG_queries: dataset containing different files with queries describing the UN SDGs taken from the UN Sustainable Development Goals Taxonomy.
      • SDG_titles.csv: uses the title of each SDG as queries.
      • SDG_Headlines.csv: uses the headline of each SDG as queries.
      • SDG_Subjects.csv: uses each subject identified in each SDG as queries, making several queries for each SDG.
      • SDG_Targets.csv: uses each target comprised in each SDG as queries, making several queries for each SDG.
      • SDG_Numbers.csv: uses SDG + number as query for each SDG.
  • prompt_selector (str): prompts that will be added to each query as a prefix to build the second sentence for NLI comparison, and being the first one each sentence to be classified from the test dataset. Current prompts are:

    • None.
    • 'This is '.
    • 'The subject is ' particularly included for the queries of SDG_targets.csv file.
    • 'The Sustainable Development Goal is ' particularly included for the queries of the SDG_Numbers.csv file.
  • metric_selector (str or os.PathLike): Evaluation module identifier on the HuggingFace evaluate repo or local path to a metric script. The metric must accept the following inputs:

    • predictions, (numpy.array(float32)[sentences to classify,number of classes]): numpy array with the softmax logits values of the entailment dimension of the NLI inference on the sentences to be classified for each class.
    • references , *(numpy.array(int32)[sentences to classify,2]: numpy array with the reference label_ids and nli_label of the sentences to be classified, given in the test_dataset.
    • A kwarg named prediction_strategies, (list(str))*. The metric must be able to handle a family of prediction strategies which must be included within the options lists for the parameter prediction_strategy_selector in the options.py file.

    The app currently includes the following metrics:

    • gorkaartola/metric_for_tp_fp_samples. This metric is specially designed to measure the performance of sentence classification models over multiclass test datasets containing both True Positive samples, meaning that the label associated to the sentence in the sample is correctly assigned, and False Positive samples, meaning that the label associated to the sentence in the sample is incorrectly assigned.

      The prediction_strategies implemented in this metric are:

      • argmax, which takes the highest value of the softmax inference logits to select the prediction.
      • threshold, which takes all softmax inference logits above a certain value to select the predictions.
      • topk, which takes the highest k softmax inference logits to select the predictions.
  • prediction_strategy_selector (str): identifiers oft he strategies implemented in the corresponding metric for the selection of the predictions. The strategy choices currently included are:

    • For the gorkaartola/metric_for_tp_fp_samples metric:
      • argmax
      • threshold: with values of 0.05, 0.25, 0.5 and 0.75. To add a new threshold strategy a new key can be introduced in the predition_strategy_options dict of the options.py file with a list as follows ["threshold", desired value].
      • topk: with valules of 3, 5, 7 and 9. To add a new topk strategy a new key can be introduced in the predition_strategy_options dict of the options.py file with a list as follows ["topk", desired value].

Output

The output is a .csv file that includes, for every prediction strategy selected, a detailed table of results including, recall, precision, f1-score and accuracy of the predictions for each class, and both overall micro and macro averages. If cloned and run in local, this file is saved in the Reports folder and a file with the calculated entailment softmax logits for each query is also saved in the Reports/ZS inference tables folder. The files included in the Huggingface repo have been uploaded as examples of the results obtained from the app.

Limitations and Bias

Please refer to the limitations and bias of the models use for the inference

References

BibLaTex

@misc{schmidt_felix_2021_4964606,
  author       = {Schmidt, Felix and
                  Vanderfeesten, Maurice},
  title        = {{Evaluation on accuracy of mapping science to the 
                   United Nations' Sustainable Development Goals
                   (SDGs) of the Aurora SDG queries}},
  month        = jun,
  year         = 2021,
  note         = {{<p><a href="https://ec.europa.eu/info/funding- 
                   tenders/opportunities/portal/screen/how-to-
                   participate/org-details/999880560/project/10100401
                   3/program/31114387/details">Funded by European
                   Commission, Project ID: 101004013, Call:
                   EAC-A02-2019-1, Programme: EPLUS2020, DG/Agency:
                   EACEA</a></p> <p>[ <a href="https://aurora-
                   network.global/project/sdg-analysis-bibliometrics-
                   relevance/">Project website</a> |  <a
                   href="https://zenodo.org/communities/aurora-
                   universities-network">Zenodo Community</a> |  <a
                   href="https://github.com/Aurora-Network-Global
                   /sdg-queries-evaluation-report">Github</a> ]</p>}},
  publisher    = {Zenodo},
  version      = {v1.0.2},
  doi          = {10.5281/zenodo.4964606},
  url          = {https://doi.org/10.5281/zenodo.4964606}
}

Citation

BibLaTeX

@online{ZS_classifier_tester,
  author = {Gorka Artola},
  title = {Zero Shot Classifier Tester for True Positive and False Positive Samples},
  year = 2022,
  url = {https://huggingface.co/spaces/gorkaartola/Zero_Shot_Classifier_by_SDGs},
  urldate = {2022-08-11}
}