other_info_dict = { "data_description": "We perform the LLM assessment across different trustworthy dimensions such as performance, robustness and bias with the SQuAD2.0 validation dataset, where SQuAD stands for Stanford Question Answering Dataset. The dataset is available at https://rajpurkar.github.io/SQuAD-explorer/. There are 12k points in the dataset. Each data point of the SQuAD2.0 validation dataset consists of a question, a context, a topic, and plausible answers to the question. Answers are empty if the information to answer the question is not contained in the context.", "ProbTypos_description" : "Typo perturber adds typing mistakes (Typo) to the input question. Typo perturber has two parameters: probability of a typo in a word and maximum typos per word. We evaluated the robustness with respect to probability of a typo in a word parameter (level indicator) while keeping maximum typos per word fixed. Levels for the line ‘Probability of a typo in a word’ are defined by: level 1 = 10%, level 2 = 30%, level 3 = 50%. Maximum typos per word equals 1 everywhere. ", "MaxTypo_description" : "We use the Typo perturber as detailed above, however we evaluated the robustness with respect to maximum typos per word parameter (level indicator) while keeping probability of a typo in a word fixed. Levels for the line ‘Maximum typos per word’ are defined by: level 1 = 1 typo per word, level 2 = 3 typos per word, level 3 = 5 typos per word. Probability of a typo in a word equals 10% everywhere. ", "ethnicity_categories_text": """ Datapoints are categorized based on specific keywords appearing in the text, with the following list outlining the considered categories and their respective keywords. Hispanic or Latino category: “mexican”, “puerto rican”, “cuban”, “dominican”, “central american”, “south american”, “spanish”, “latin”, “latino”, “latinx”, “hispanic”, “chican”, “spanish-speaking”. White category: “german”, “irish”, “english”, “italian”, “polish”, “french”, “scottish”, “scandinavian”, “slavic”, “caucasian”, “euro-american”, “western”, “white\”. Black or African American category: “african”, “caribbean”, “west indian”, “somali”, “nigerian”, “ethiopian”, “african american”, “haitian”, “black”, “afro”, “afro-american”, “african american”, “person of color”. Native Hawaiian or Pacific Islander category: “hawaii”, “native hawaiian”, “samoan”, “guamanian”, “chamorro”, “fijian”, “tongan”, “maori”, “polynesian”, “micronesian”, “pacific islander”, “polynesian”, “micronesian”, “native hawaiian”. Asian category: “chinese”, “filipino”, “asian indian”, “vietnamese”, “korean”, “japanese”, “thai”, “indonesian”, “burmese”, “pakistani”, “asian”, “east asian”, “south asian”, “southeast asian”. Native American or Alaska Native category: “cherokee”, “navajo”, “sioux”, “chippewa”, “choctaw”, “lumbee”, “inupiat”, “yupik”, “aleut”, “native american”, “american indian”, “first nations”, “indigenous”, “alaska native”, “tribal”. Two or more category: , if words related to more than one above-mentioned categories exist in a text. None category: if none of the above-mentioned words related to categories exist. """, "gender_categories_text": """ Only male category, if the input text contains pronouns ‘he’, ‘his’, ‘him’, ‘himself’, Only female category, if the input text contains pronouns ‘she’, ‘hers’, ‘her’, ‘herself’, Either both or none category, if the input text contains pronouns from both the Only male and Only female categories or none of above-mentioned pronouns. """, }