sileod's picture
Create tasks.md
b0d57dd
|
raw
history blame
14 kB
- 0 babi_nli/counting
- 1 babi_nli/indefinite-knowledge
- 2 babi_nli/simple-negation
- 3 babi_nli/three-arg-relations
- 4 babi_nli/basic-induction
- 5 babi_nli/time-reasoning
- 6 babi_nli/compound-coreference
- 7 babi_nli/path-finding
- 8 babi_nli/positional-reasoning
- 9 babi_nli/conjunction
- 10 babi_nli/size-reasoning
- 11 babi_nli/yes-no-questions
- 12 babi_nli/basic-coreference
- 13 babi_nli/two-supporting-facts
- 14 babi_nli/lists-sets
- 15 babi_nli/two-arg-relations
- 16 babi_nli/three-supporting-facts
- 17 babi_nli/basic-deduction
- 18 babi_nli/single-supporting-fact
- 19 anli/a1
- 20 anli/a2
- 21 anli/a3
- 22 sick/label
- 23 sick/relatedness
- 24 sick/entailment_AB
- 25 sick/entailment_BA
- 26 snli
- 27 scitail/snli_format
- 28 hans
- 29 WANLI
- 30 recast/recast_kg_relations
- 31 recast/recast_puns
- 32 recast/recast_factuality
- 33 recast/recast_megaveridicality
- 34 recast/recast_verbcorner
- 35 recast/recast_verbnet
- 36 recast/recast_ner
- 37 recast/recast_sentiment
- 38 probability_words_nli/usnli
- 39 probability_words_nli/reasoning_1hop
- 40 probability_words_nli/reasoning_2hop
- 41 nan-nli/joey234--nan-nli
- 42 nli_fever
- 43 breaking_nli
- 44 conj_nli
- 45 fracas
- 46 dialogue_nli
- 47 mpe
- 48 dnc
- 49 gpt3_nli
- 50 recast_white/fnplus
- 51 recast_white/sprl
- 52 recast_white/dpr
- 53 joci
- 54 contrast_nli
- 55 robust_nli/IS_CS
- 56 robust_nli/LI_LI
- 57 robust_nli/ST_WO
- 58 robust_nli/PI_SP
- 59 robust_nli/PI_CD
- 60 robust_nli/ST_SE
- 61 robust_nli/ST_NE
- 62 robust_nli/ST_LM
- 63 robust_nli_is_sd
- 64 robust_nli_li_ts
- 65 gen_debiased_nli/snli_seq_z
- 66 gen_debiased_nli/snli_z_aug
- 67 gen_debiased_nli/snli_par_z
- 68 gen_debiased_nli/mnli_par_z
- 69 gen_debiased_nli/mnli_z_aug
- 70 gen_debiased_nli/mnli_seq_z
- 71 add_one_rte
- 72 imppres/presupposition_cleft_uniqueness/presupposition
- 73 imppres/presupposition_possessed_definites_uniqueness/presupposition
- 74 imppres/presupposition_possessed_definites_existence/presupposition
- 75 imppres/presupposition_only_presupposition/presupposition
- 76 imppres/presupposition_all_n_presupposition/presupposition
- 77 imppres/presupposition_both_presupposition/presupposition
- 78 imppres/presupposition_change_of_state/presupposition
- 79 imppres/presupposition_cleft_existence/presupposition
- 80 imppres/presupposition_question_presupposition/presupposition
- 81 imppres/implicature_modals/prag
- 82 imppres/implicature_numerals_10_100/prag
- 83 imppres/implicature_numerals_2_3/prag
- 84 imppres/implicature_gradable_adjective/prag
- 85 imppres/implicature_quantifiers/prag
- 86 imppres/implicature_gradable_verb/prag
- 87 imppres/implicature_connectives/prag
- 88 imppres/implicature_gradable_adjective/log
- 89 imppres/implicature_gradable_verb/log
- 90 imppres/implicature_numerals_2_3/log
- 91 imppres/implicature_numerals_10_100/log
- 92 imppres/implicature_modals/log
- 93 imppres/implicature_quantifiers/log
- 94 imppres/implicature_connectives/log
- 95 glue_diagnostics/diagnostics
- 96 hlgd
- 97 paws/labeled_final
- 98 paws/labeled_swap
- 99 quora
- 100 medical_questions_pairs
- 101 conll2003/pos_tags
- 102 conll2003/chunk_tags
- 103 conll2003/ner_tags
- 104 hh-rlhf
- 105 model-written-evals
- 106 truthful_qa/multiple_choice
- 107 fig-qa
- 108 bigbench/fantasy_reasoning
- 109 bigbench/nonsense_words_grammar
- 110 bigbench/analytic_entailment
- 111 bigbench/logic_grid_puzzle
- 112 bigbench/geometric_shapes
- 113 bigbench/key_value_maps
- 114 bigbench/analogical_similarity
- 115 bigbench/metaphor_understanding
- 116 bigbench/metaphor_boolean
- 117 bigbench/ruin_names
- 118 bigbench/cs_algorithms
- 119 bigbench/physical_intuition
- 120 bigbench/mnist_ascii
- 121 bigbench/moral_permissibility
- 122 bigbench/emoji_movie
- 123 bigbench/snarks
- 124 bigbench/timedial
- 125 bigbench/dark_humor_detection
- 126 bigbench/gre_reading_comprehension
- 127 bigbench/empirical_judgments
- 128 bigbench/causal_judgment
- 129 bigbench/fact_checker
- 130 bigbench/logical_fallacy_detection
- 131 bigbench/identify_math_theorems
- 132 bigbench/dyck_languages
- 133 bigbench/winowhy
- 134 bigbench/logical_sequence
- 135 bigbench/strategyqa
- 136 bigbench/unit_interpretation
- 137 bigbench/authorship_verification
- 138 bigbench/undo_permutation
- 139 bigbench/epistemic_reasoning
- 140 bigbench/human_organs_senses
- 141 bigbench/misconceptions
- 142 bigbench/international_phonetic_alphabet_nli
- 143 bigbench/identify_odd_metaphor
- 144 bigbench/mathematical_induction
- 145 bigbench/odd_one_out
- 146 bigbench/reasoning_about_colored_objects
- 147 bigbench/strange_stories
- 148 bigbench/evaluating_information_essentiality
- 149 bigbench/figure_of_speech_detection
- 150 bigbench/english_proverbs
- 151 bigbench/general_knowledge
- 152 bigbench/tracking_shuffled_objects
- 153 bigbench/physics
- 154 bigbench/anachronisms
- 155 bigbench/simple_ethical_questions
- 156 bigbench/logical_args
- 157 bigbench/suicide_risk
- 158 bigbench/sentence_ambiguity
- 159 bigbench/temporal_sequences
- 160 bigbench/penguins_in_a_table
- 161 bigbench/sports_understanding
- 162 bigbench/hyperbaton
- 163 bigbench/code_line_description
- 164 bigbench/question_selection
- 165 bigbench/disambiguation_qa
- 166 bigbench/date_understanding
- 167 bigbench/play_dialog_same_or_different
- 168 bigbench/salient_translation_error_detection
- 169 bigbench/irony_identification
- 170 bigbench/emojis_emotion_prediction
- 171 bigbench/hindu_knowledge
- 172 bigbench/conceptual_combinations
- 173 bigbench/implicatures
- 174 bigbench/movie_dialog_same_or_different
- 175 bigbench/social_support
- 176 bigbench/presuppositions_as_nli
- 177 bigbench/vitaminc_fact_verification
- 178 bigbench/hhh_alignment
- 179 bigbench/implicit_relations
- 180 bigbench/bbq_lite_json
- 181 bigbench/phrase_relatedness
- 182 bigbench/logical_deduction
- 183 bigbench/discourse_marker_prediction
- 184 bigbench/movie_recommendation
- 185 bigbench/real_or_fake_text
- 186 bigbench/formal_fallacies_syllogisms_negation
- 187 bigbench/crass_ai
- 188 blimp/inchoative
- 189 blimp/principle_A_c_command
- 190 blimp/matrix_question_npi_licensor_present
- 191 blimp/wh_questions_subject_gap_long_distance
- 192 blimp/sentential_subject_island
- 193 blimp/existential_there_quantifiers_2
- 194 blimp/sentential_negation_npi_scope
- 195 blimp/complex_NP_island
- 196 blimp/principle_A_reconstruction
- 197 blimp/animate_subject_passive
- 198 blimp/tough_vs_raising_1
- 199 blimp/wh_vs_that_with_gap
- 200 blimp/principle_A_domain_2
- 201 blimp/npi_present_1
- 202 blimp/wh_vs_that_with_gap_long_distance
- 203 blimp/superlative_quantifiers_1
- 204 blimp/npi_present_2
- 205 blimp/wh_questions_object_gap
- 206 blimp/coordinate_structure_constraint_complex_left_branch
- 207 blimp/coordinate_structure_constraint_object_extraction
- 208 blimp/left_branch_island_echo_question
- 209 blimp/drop_argument
- 210 cos_e/v1.0
- 211 cosmos_qa
- 212 dream
- 213 openbookqa
- 214 qasc
- 215 quartz
- 216 quail
- 217 head_qa/en
- 218 sciq
- 219 social_i_qa
- 220 wiki_hop
- 221 wiqa
- 222 piqa
- 223 hellaswag
- 224 super_glue/copa
- 225 art
- 226 hendrycks_test/moral_disputes
- 227 hendrycks_test/moral_scenarios
- 228 hendrycks_test/nutrition
- 229 hendrycks_test/philosophy
- 230 hendrycks_test/prehistory
- 231 hendrycks_test/professional_accounting
- 232 hendrycks_test/professional_law
- 233 hendrycks_test/world_religions
- 234 hendrycks_test/professional_psychology
- 235 hendrycks_test/public_relations
- 236 hendrycks_test/security_studies
- 237 hendrycks_test/sociology
- 238 hendrycks_test/us_foreign_policy
- 239 hendrycks_test/virology
- 240 hendrycks_test/miscellaneous
- 241 hendrycks_test/professional_medicine
- 242 hendrycks_test/medical_genetics
- 243 hendrycks_test/college_mathematics
- 244 hendrycks_test/management
- 245 hendrycks_test/high_school_computer_science
- 246 hendrycks_test/astronomy
- 247 hendrycks_test/high_school_chemistry
- 248 hendrycks_test/high_school_biology
- 249 hendrycks_test/global_facts
- 250 hendrycks_test/formal_logic
- 251 hendrycks_test/elementary_mathematics
- 252 hendrycks_test/high_school_european_history
- 253 hendrycks_test/electrical_engineering
- 254 hendrycks_test/conceptual_physics
- 255 hendrycks_test/computer_security
- 256 hendrycks_test/college_physics
- 257 hendrycks_test/college_medicine
- 258 hendrycks_test/college_computer_science
- 259 hendrycks_test/college_chemistry
- 260 hendrycks_test/college_biology
- 261 hendrycks_test/econometrics
- 262 hendrycks_test/clinical_knowledge
- 263 hendrycks_test/anatomy
- 264 hendrycks_test/marketing
- 265 hendrycks_test/machine_learning
- 266 hendrycks_test/logical_fallacies
- 267 hendrycks_test/jurisprudence
- 268 hendrycks_test/international_law
- 269 hendrycks_test/human_sexuality
- 270 hendrycks_test/human_aging
- 271 hendrycks_test/high_school_world_history
- 272 hendrycks_test/abstract_algebra
- 273 hendrycks_test/high_school_us_history
- 274 hendrycks_test/high_school_psychology
- 275 hendrycks_test/high_school_physics
- 276 hendrycks_test/high_school_microeconomics
- 277 hendrycks_test/high_school_mathematics
- 278 hendrycks_test/high_school_macroeconomics
- 279 hendrycks_test/high_school_government_and_politics
- 280 hendrycks_test/high_school_geography
- 281 hendrycks_test/high_school_statistics
- 282 hendrycks_test/business_ethics
- 283 winogrande/winogrande_xl
- 284 codah/codah
- 285 ai2_arc/ARC-Challenge/challenge
- 286 ai2_arc/ARC-Easy/challenge
- 287 definite_pronoun_resolution
- 288 swag
- 289 math_qa
- 290 utilitarianism
- 291 TuringBench
- 292 trec
- 293 vitaminc/tals--vitaminc
- 294 hope_edi/english
- 295 rumoureval_2019/RumourEval2019
- 296 ethos/binary
- 297 ethos/multilabel
- 298 glue/cola
- 299 glue/sst2
- 300 glue/mrpc
- 301 glue/qqp
- 302 glue/stsb
- 303 glue/mnli
- 304 glue/qnli
- 305 glue/rte
- 306 glue/wnli
- 307 super_glue/boolq
- 308 super_glue/cb
- 309 super_glue/multirc
- 310 super_glue/wic
- 311 super_glue/axg
- 312 tweet_eval/stance_feminist
- 313 tweet_eval/stance_atheism
- 314 tweet_eval/stance_hillary
- 315 tweet_eval/stance_abortion
- 316 tweet_eval/sentiment
- 317 tweet_eval/offensive
- 318 tweet_eval/stance_climate
- 319 tweet_eval/irony
- 320 tweet_eval/emotion
- 321 tweet_eval/emoji
- 322 tweet_eval/hate
- 323 discovery/discovery
- 324 pragmeval/switchboard
- 325 pragmeval/squinky-informativeness
- 326 pragmeval/emobank-arousal
- 327 pragmeval/emobank-dominance
- 328 pragmeval/emobank-valence
- 329 pragmeval/mrda
- 330 pragmeval/verifiability
- 331 pragmeval/squinky-implicature
- 332 pragmeval/squinky-formality
- 333 pragmeval/gum
- 334 pragmeval/emergent
- 335 pragmeval/persuasiveness-premisetype
- 336 pragmeval/pdtb
- 337 pragmeval/persuasiveness-eloquence
- 338 pragmeval/persuasiveness-specificity
- 339 pragmeval/persuasiveness-strength
- 340 pragmeval/sarcasm
- 341 pragmeval/stac
- 342 pragmeval/persuasiveness-claimtype
- 343 pragmeval/persuasiveness-relevance
- 344 lex_glue/eurlex
- 345 lex_glue/scotus
- 346 lex_glue/ledgar
- 347 lex_glue/unfair_tos
- 348 lex_glue/case_hold
- 349 imdb
- 350 rotten_tomatoes
- 351 ag_news
- 352 yelp_review_full/yelp_review_full
- 353 financial_phrasebank/sentences_allagree
- 354 poem_sentiment
- 355 dbpedia_14/dbpedia_14
- 356 amazon_polarity/amazon_polarity
- 357 app_reviews
- 358 hate_speech18
- 359 sms_spam
- 360 humicroedit/subtask-1
- 361 humicroedit/subtask-2
- 362 snips_built_in_intents
- 363 banking77
- 364 hate_speech_offensive
- 365 hyperpartisan_news_detection/byarticle
- 366 hyperpartisan_news_detection/bypublisher
- 367 go_emotions/simplified
- 368 scicite
- 369 liar
- 370 lexical_relation_classification/ROOT09
- 371 lexical_relation_classification/EVALution
- 372 lexical_relation_classification/CogALexV
- 373 lexical_relation_classification/BLESS
- 374 lexical_relation_classification/K&H+N
- 375 linguisticprobing/coordination_inversion
- 376 linguisticprobing/odd_man_out
- 377 linguisticprobing/word_content
- 378 linguisticprobing/obj_number
- 379 linguisticprobing/past_present
- 380 linguisticprobing/tree_depth
- 381 linguisticprobing/sentence_length
- 382 linguisticprobing/top_constituents
- 383 linguisticprobing/bigram_shift
- 384 linguisticprobing/subj_number
- 385 crowdflower/sentiment_nuclear_power
- 386 crowdflower/tweet_global_warming
- 387 crowdflower/airline-sentiment
- 388 crowdflower/economic-news
- 389 crowdflower/political-media-audience
- 390 crowdflower/political-media-bias
- 391 crowdflower/political-media-message
- 392 crowdflower/text_emotion
- 393 crowdflower/corporate-messaging
- 394 ethics/commonsense
- 395 ethics/deontology
- 396 ethics/justice
- 397 ethics/virtue
- 398 emo/emo2019
- 399 google_wellformed_query
- 400 tweets_hate_speech_detection
- 401 adv_glue/adv_sst2
- 402 adv_glue/adv_qqp
- 403 adv_glue/adv_mnli
- 404 adv_glue/adv_mnli_mismatched
- 405 adv_glue/adv_qnli
- 406 adv_glue/adv_rte
- 407 has_part
- 408 wnut_17/wnut_17
- 409 ncbi_disease/ncbi_disease
- 410 acronym_identification
- 411 jnlpba/jnlpba
- 412 species_800/species_800
- 413 ontonotes_english/SpeedOfMagic--ontonotes_english
- 414 blog_authorship_corpus/gender
- 415 blog_authorship_corpus/age
- 416 blog_authorship_corpus/horoscope
- 417 blog_authorship_corpus/job
- 418 open_question_type
- 419 health_fact
- 420 commonsense_qa
- 421 mc_taco
- 422 ade_corpus_v2/Ade_corpus_v2_classification
- 423 discosense
- 424 circa
- 425 code_x_glue_cc_defect_detection
- 426 code_x_glue_cc_clone_detection_big_clone_bench
- 427 code_x_glue_cc_code_refinement/medium
- 428 EffectiveFeedbackStudentWriting
- 429 promptSentiment
- 430 promptNLI
- 431 promptSpoke
- 432 promptProficiency
- 433 promptGrammar
- 434 promptCoherence
- 435 phrase_similarity
- 436 scientific-exaggeration-detection
- 437 quarel
- 438 fever-evidence-related/mwong--fever-related
- 439 numer_sense
- 440 dynasent/dynabench.dynasent.r1.all/r1
- 441 dynasent/dynabench.dynasent.r2.all/r2
- 442 Sarcasm_News_Headline
- 443 sem_eval_2010_task_8