Available Tasks
You can get a list of all the available tasks by running:
lighteval tasks list
You can also inspect a specific task by running:
lighteval tasks inspect <task_name>
List of tasks
bigbench:
- bigbench|abstract_narrative_understanding
- bigbench|anachronisms
- bigbench|analogical_similarity
- bigbench|analytic_entailment
- bigbench|arithmetic_bb
- bigbench|ascii_word_recognition
- bigbench|authorship_verification
- bigbench|auto_categorization
- bigbench|auto_debugging
- bigbench|bbq_lite_json
- bigbench|bridging_anaphora_resolution_barqa
- bigbench|causal_judgment
- bigbench|cause_and_effect
- bigbench|checkmate_in_one
- bigbench|chess_state_tracking
- bigbench|chinese_remainder_theorem
- bigbench|cifar10_classification
- bigbench|code_line_description
- bigbench|codenames
- bigbench|color
- bigbench|common_morpheme
- bigbench|conceptual_combinations
- bigbench|conlang_translation
- bigbench|contextual_parametric_knowledge_conflicts
- bigbench|coqa_bb
- bigbench|crash_blossom
- bigbench|crass_ai
- bigbench|cryobiology_spanish
- bigbench|cryptonite
- bigbench|cs_algorithms
- bigbench|dark_humor_detection
- bigbench|date_understanding
- bigbench|disambiguation_qa
- bigbench|discourse_marker_prediction
- bigbench|disfl_qa
- bigbench|dyck_languages
- bigbench|elementary_math_qa
- bigbench|emoji_movie
- bigbench|emojis_emotion_prediction
- bigbench|empirical_judgments
- bigbench|english_proverbs
- bigbench|english_russian_proverbs
- bigbench|entailed_polarity
- bigbench|entailed_polarity_hindi
- bigbench|epistemic_reasoning
- bigbench|evaluating_information_essentiality
- bigbench|fact_checker
- bigbench|fantasy_reasoning
- bigbench|few_shot_nlg
- bigbench|figure_of_speech_detection
- bigbench|formal_fallacies_syllogisms_negation
- bigbench|gem
- bigbench|gender_inclusive_sentences_german
- bigbench|general_knowledge
- bigbench|geometric_shapes
- bigbench|goal_step_wikihow
- bigbench|gre_reading_comprehension
- bigbench|hhh_alignment
- bigbench|hindi_question_answering
- bigbench|hindu_knowledge
- bigbench|hinglish_toxicity
- bigbench|human_organs_senses
- bigbench|hyperbaton
- bigbench|identify_math_theorems
- bigbench|identify_odd_metaphor
- bigbench|implicatures
- bigbench|implicit_relations
- bigbench|intent_recognition
- bigbench|international_phonetic_alphabet_nli
- bigbench|international_phonetic_alphabet_transliterate
- bigbench|intersect_geometry
- bigbench|irony_identification
- bigbench|kanji_ascii
- bigbench|kannada
- bigbench|key_value_maps
- bigbench|known_unknowns
- bigbench|language_games
- bigbench|language_identification
- bigbench|linguistic_mappings
- bigbench|linguistics_puzzles
- bigbench|logic_grid_puzzle
- bigbench|logical_args
- bigbench|logical_deduction
- bigbench|logical_fallacy_detection
- bigbench|logical_sequence
- bigbench|mathematical_induction
- bigbench|matrixshapes
- bigbench|metaphor_boolean
- bigbench|metaphor_understanding
- bigbench|minute_mysteries_qa
- bigbench|misconceptions
- bigbench|misconceptions_russian
- bigbench|mnist_ascii
- bigbench|modified_arithmetic
- bigbench|moral_permissibility
- bigbench|movie_dialog_same_or_different
- bigbench|movie_recommendation
- bigbench|mult_data_wrangling
- bigbench|multiemo
- bigbench|natural_instructions
- bigbench|navigate
- bigbench|nonsense_words_grammar
- bigbench|novel_concepts
- bigbench|object_counting
- bigbench|odd_one_out
- bigbench|operators
- bigbench|paragraph_segmentation
- bigbench|parsinlu_qa
- bigbench|parsinlu_reading_comprehension
- bigbench|penguins_in_a_table
- bigbench|periodic_elements
- bigbench|persian_idioms
- bigbench|phrase_relatedness
- bigbench|physical_intuition
- bigbench|physics
- bigbench|physics_questions
- bigbench|play_dialog_same_or_different
- bigbench|polish_sequence_labeling
- bigbench|presuppositions_as_nli
- bigbench|qa_wikidata
- bigbench|question_selection
- bigbench|real_or_fake_text
- bigbench|reasoning_about_colored_objects
- bigbench|repeat_copy_logic
- bigbench|rephrase
- bigbench|rhyming
- bigbench|riddle_sense
- bigbench|ruin_names
- bigbench|salient_translation_error_detection
- bigbench|scientific_press_release
- bigbench|semantic_parsing_in_context_sparc
- bigbench|semantic_parsing_spider
- bigbench|sentence_ambiguity
- bigbench|similarities_abstraction
- bigbench|simp_turing_concept
- bigbench|simple_arithmetic_json
- bigbench|simple_arithmetic_json_multiple_choice
- bigbench|simple_arithmetic_json_subtasks
- bigbench|simple_arithmetic_multiple_targets_json
- bigbench|simple_ethical_questions
- bigbench|simple_text_editing
- bigbench|snarks
- bigbench|social_iqa
- bigbench|social_support
- bigbench|sports_understanding
- bigbench|strange_stories
- bigbench|strategyqa
- bigbench|sufficient_information
- bigbench|suicide_risk
- bigbench|swahili_english_proverbs
- bigbench|swedish_to_german_proverbs
- bigbench|symbol_interpretation
- bigbench|tellmewhy
- bigbench|temporal_sequences
- bigbench|tense
- bigbench|timedial
- bigbench|topical_chat
- bigbench|tracking_shuffled_objects
- bigbench|understanding_fables
- bigbench|undo_permutation
- bigbench|unit_conversion
- bigbench|unit_interpretation
- bigbench|unnatural_in_context_learning
- bigbench|vitaminc_fact_verification
- bigbench|what_is_the_tao
- bigbench|which_wiki_edit
- bigbench|wino_x_german
- bigbench|winowhy
- bigbench|word_sorting
- bigbench|word_unscrambling
harness:
- harness|bbh:boolean_expressions
- harness|bbh:causal_judgment
- harness|bbh:date_understanding
- harness|bbh:disambiguation_qa
- harness|bbh:dyck_languages
- harness|bbh:formal_fallacies
- harness|bbh:geometric_shapes
- harness|bbh:hyperbaton
- harness|bbh:logical_deduction_five_objects
- harness|bbh:logical_deduction_seven_objects
- harness|bbh:logical_deduction_three_objects
- harness|bbh:movie_recommendation
- harness|bbh:multistep_arithmetic_two
- harness|bbh:navigate
- harness|bbh:object_counting
- harness|bbh:penguins_in_a_table
- harness|bbh:reasoning_about_colored_objects
- harness|bbh:ruin_names
- harness|bbh:salient_translation_error_detection
- harness|bbh:snarks
- harness|bbh:sports_understanding
- harness|bbh:temporal_sequences
- harness|bbh:tracking_shuffled_objects_five_objects
- harness|bbh:tracking_shuffled_objects_seven_objects
- harness|bbh:tracking_shuffled_objects_three_objects
- harness|bbh:web_of_lies
- harness|bbh:word_sorting
- harness|bigbench:causal_judgment
- harness|bigbench:date_understanding
- harness|bigbench:disambiguation_qa
- harness|bigbench:geometric_shapes
- harness|bigbench:logical_deduction_five_objects
- harness|bigbench:logical_deduction_seven_objects
- harness|bigbench:logical_deduction_three_objects
- harness|bigbench:movie_recommendation
- harness|bigbench:navigate
- harness|bigbench:reasoning_about_colored_objects
- harness|bigbench:ruin_names
- harness|bigbench:salient_translation_error_detection
- harness|bigbench:snarks
- harness|bigbench:sports_understanding
- harness|bigbench:temporal_sequences
- harness|bigbench:tracking_shuffled_objects_five_objects
- harness|bigbench:tracking_shuffled_objects_seven_objects
- harness|bigbench:tracking_shuffled_objects_three_objects
- harness|wikitext:103:document_level
helm:
- helm|babi_qa
- helm|bbq
- helm|bbq:Age
- helm|bbq:Disability_status
- helm|bbq:Gender_identity
- helm|bbq:Physical_appearance
- helm|bbq:Race_ethnicity
- helm|bbq:Race_x_SES
- helm|bbq:Race_x_gender
- helm|bbq:Religion
- helm|bbq:SES
- helm|bbq:Sexual_orientation
- helm|bbq=Nationality
- helm|bigbench:auto_debugging
- helm|bigbench:bbq_lite_json:age_ambig
- helm|bigbench:bbq_lite_json:age_disambig
- helm|bigbench:bbq_lite_json:disability_status_ambig
- helm|bigbench:bbq_lite_json:disability_status_disambig
- helm|bigbench:bbq_lite_json:gender_identity_ambig
- helm|bigbench:bbq_lite_json:gender_identity_disambig
- helm|bigbench:bbq_lite_json:nationality_ambig
- helm|bigbench:bbq_lite_json:nationality_disambig
- helm|bigbench:bbq_lite_json:physical_appearance_ambig
- helm|bigbench:bbq_lite_json:physical_appearance_disambig
- helm|bigbench:bbq_lite_json:race_ethnicity_ambig
- helm|bigbench:bbq_lite_json:race_ethnicity_disambig
- helm|bigbench:bbq_lite_json:religion_ambig
- helm|bigbench:bbq_lite_json:religion_disambig
- helm|bigbench:bbq_lite_json:ses_ambig
- helm|bigbench:bbq_lite_json:ses_disambig
- helm|bigbench:bbq_lite_json:sexual_orientation_ambig
- helm|bigbench:bbq_lite_json:sexual_orientation_disambig
- helm|bigbench:code_line_description
- helm|bigbench:conceptual_combinations:contradictions
- helm|bigbench:conceptual_combinations:emergent_properties
- helm|bigbench:conceptual_combinations:fanciful_fictional_combinations
- helm|bigbench:conceptual_combinations:homonyms
- helm|bigbench:conceptual_combinations:invented_words
- helm|bigbench:conlang_translation:adna_from
- helm|bigbench:conlang_translation:adna_to
- helm|bigbench:conlang_translation:atikampe_from
- helm|bigbench:conlang_translation:atikampe_to
- helm|bigbench:conlang_translation:gornam_from
- helm|bigbench:conlang_translation:gornam_to
- helm|bigbench:conlang_translation:holuan_from
- helm|bigbench:conlang_translation:holuan_to
- helm|bigbench:conlang_translation:mkafala_from
- helm|bigbench:conlang_translation:mkafala_to
- helm|bigbench:conlang_translation:postpositive_english_from
- helm|bigbench:conlang_translation:postpositive_english_to
- helm|bigbench:conlang_translation:unapuri_from
- helm|bigbench:conlang_translation:unapuri_to
- helm|bigbench:conlang_translation:vaomi_from
- helm|bigbench:conlang_translation:vaomi_to
- helm|bigbench:emoji_movie
- helm|bigbench:formal_fallacies_syllogisms_negation
- helm|bigbench:hindu_knowledge
- helm|bigbench:known_unknowns
- helm|bigbench:language_identification
- helm|bigbench:linguistics_puzzles
- helm|bigbench:logic_grid_puzzle
- helm|bigbench:logical_deduction-five_objects
- helm|bigbench:logical_deduction-seven_objects
- helm|bigbench:logical_deduction-three_objects
- helm|bigbench:misconceptions_russian
- helm|bigbench:novel_concepts
- helm|bigbench:operators
- helm|bigbench:parsinlu_reading_comprehension
- helm|bigbench:play_dialog_same_or_different
- helm|bigbench:repeat_copy_logic
- helm|bigbench:strange_stories-boolean
- helm|bigbench:strange_stories-multiple_choice
- helm|bigbench:strategyqa
- helm|bigbench:symbol_interpretation-adversarial
- helm|bigbench:symbol_interpretation-emoji_agnostic
- helm|bigbench:symbol_interpretation-name_agnostic
- helm|bigbench:symbol_interpretation-plain
- helm|bigbench:symbol_interpretation-tricky
- helm|bigbench:vitaminc_fact_verification
- helm|bigbench:winowhy
- helm|blimp:adjunct_island
- helm|blimp:anaphor_gender_agreement
- helm|blimp:anaphor_number_agreement
- helm|blimp:animate_subject_passive
- helm|blimp:animate_subject_trans
- helm|blimp:causative
- helm|blimp:complex_NP_island
- helm|blimp:coordinate_structure_constraint_complex_left_branch
- helm|blimp:coordinate_structure_constraint_object_extraction
- helm|blimp:determiner_noun_agreement_1
- helm|blimp:determiner_noun_agreement_2
- helm|blimp:determiner_noun_agreement_irregular_1
- helm|blimp:determiner_noun_agreement_irregular_2
- helm|blimp:determiner_noun_agreement_with_adj_2
- helm|blimp:determiner_noun_agreement_with_adj_irregular_1
- helm|blimp:determiner_noun_agreement_with_adj_irregular_2
- helm|blimp:determiner_noun_agreement_with_adjective_1
- helm|blimp:distractor_agreement_relational_noun
- helm|blimp:distractor_agreement_relative_clause
- helm|blimp:drop_argument
- helm|blimp:ellipsis_n_bar_1
- helm|blimp:ellipsis_n_bar_2
- helm|blimp:existential_there_object_raising
- helm|blimp:existential_there_quantifiers_1
- helm|blimp:existential_there_quantifiers_2
- helm|blimp:existential_there_subject_raising
- helm|blimp:expletive_it_object_raising
- helm|blimp:inchoative
- helm|blimp:intransitive
- helm|blimp:irregular_past_participle_adjectives
- helm|blimp:irregular_past_participle_verbs
- helm|blimp:irregular_plural_subject_verb_agreement_1
- helm|blimp:irregular_plural_subject_verb_agreement_2
- helm|blimp:left_branch_island_echo_question
- helm|blimp:left_branch_island_simple_question
- helm|blimp:matrix_question_npi_licensor_present
- helm|blimp:npi_present_1
- helm|blimp:npi_present_2
- helm|blimp:only_npi_licensor_present
- helm|blimp:only_npi_scope
- helm|blimp:passive_1
- helm|blimp:passive_2
- helm|blimp:principle_A_c_command
- helm|blimp:principle_A_case_1
- helm|blimp:principle_A_case_2
- helm|blimp:principle_A_domain_1
- helm|blimp:principle_A_domain_2
- helm|blimp:principle_A_domain_3
- helm|blimp:principle_A_reconstruction
- helm|blimp:regular_plural_subject_verb_agreement_1
- helm|blimp:regular_plural_subject_verb_agreement_2
- helm|blimp:sentential_negation_npi_licensor_present
- helm|blimp:sentential_negation_npi_scope
- helm|blimp:sentential_subject_island
- helm|blimp:superlative_quantifiers_1
- helm|blimp:superlative_quantifiers_2
- helm|blimp:tough_vs_raising_1
- helm|blimp:tough_vs_raising_2
- helm|blimp:transitive
- helm|blimp:wh_island
- helm|blimp:wh_questions_object_gap
- helm|blimp:wh_questions_subject_gap
- helm|blimp:wh_questions_subject_gap_long_distance
- helm|blimp:wh_vs_that_no_gap
- helm|blimp:wh_vs_that_no_gap_long_distance
- helm|blimp:wh_vs_that_with_gap
- helm|blimp:wh_vs_that_with_gap_long_distance
- helm|bold
- helm|bold:gender
- helm|bold:political_ideology
- helm|bold:profession
- helm|bold:race
- helm|bold:religious_ideology
- helm|boolq
- helm|boolq:contrastset
- helm|civil_comments
- helm|civil_comments:LGBTQ
- helm|civil_comments:black
- helm|civil_comments:christian
- helm|civil_comments:female
- helm|civil_comments:male
- helm|civil_comments:muslim
- helm|civil_comments:other_religions
- helm|civil_comments:white
- helm|commonsenseqa
- helm|copyright:n_books_1000-extractions_per_book_1-prefix_length_125
- helm|copyright:n_books_1000-extractions_per_book_1-prefix_length_25
- helm|copyright:n_books_1000-extractions_per_book_1-prefix_length_5
- helm|copyright:n_books_1000-extractions_per_book_3-prefix_length_125
- helm|copyright:n_books_1000-extractions_per_book_3-prefix_length_25
- helm|copyright:n_books_1000-extractions_per_book_3-prefix_length_5
- helm|copyright:oh_the_places
- helm|copyright:pilot
- helm|copyright:popular_books-prefix_length_10
- helm|copyright:popular_books-prefix_length_125
- helm|copyright:popular_books-prefix_length_25
- helm|copyright:popular_books-prefix_length_250
- helm|copyright:popular_books-prefix_length_5
- helm|copyright:popular_books-prefix_length_50
- helm|copyright:prompt_num_line_1-min_lines_20
- helm|copyright:prompt_num_line_10-min_lines_20
- helm|copyright:prompt_num_line_5-min_lines_20
- helm|covid_dialogue
- helm|dyck_language:2
- helm|dyck_language:3
- helm|dyck_language:4
- helm|entity_data_imputation:Buy
- helm|entity_data_imputation:Restaurant
- helm|entity_matching:Abt_Buy
- helm|entity_matching:Amazon_Google
- helm|entity_matching:Beer
- helm|entity_matching:Company
- helm|entity_matching:DBLP_ACM
- helm|entity_matching:DBLP_GoogleScholar
- helm|entity_matching:Dirty_DBLP_ACM
- helm|entity_matching:Dirty_DBLP_GoogleScholar
- helm|entity_matching:Dirty_Walmart_Amazon
- helm|entity_matching:Dirty_iTunes_Amazon
- helm|entity_matching:Walmart_Amazon
- helm|entity_matching:iTunes_Amazon
- helm|entity_matching=Fodors_Zagats
- helm|hellaswag
- helm|imdb
- helm|imdb:contrastset
- helm|interactive_qa_mmlu:abstract_algebra
- helm|interactive_qa_mmlu:college_chemistry
- helm|interactive_qa_mmlu:global_facts
- helm|interactive_qa_mmlu:miscellaneous
- helm|interactive_qa_mmlu:nutrition
- helm|interactive_qa_mmlu:us_foreign_policy
- helm|legal_summarization:billsum
- helm|legal_summarization:eurlexsum
- helm|legal_summarization:multilexsum
- helm|legalsupport
- helm|lexglue:case_hold
- helm|lexglue:ecthr_a
- helm|lexglue:ecthr_b
- helm|lexglue:eurlex
- helm|lexglue:ledgar
- helm|lexglue:scotus
- helm|lexglue:unfair_tos
- helm|lextreme:brazilian_court_decisions_judgment
- helm|lextreme:brazilian_court_decisions_unanimity
- helm|lextreme:covid19_emergency_event
- helm|lextreme:german_argument_mining
- helm|lextreme:greek_legal_code_chapter
- helm|lextreme:greek_legal_code_subject
- helm|lextreme:greek_legal_code_volume
- helm|lextreme:greek_legal_ner
- helm|lextreme:legalnero
- helm|lextreme:lener_br
- helm|lextreme:mapa_coarse
- helm|lextreme:mapa_fine
- helm|lextreme:multi_eurlex_level_1
- helm|lextreme:multi_eurlex_level_2
- helm|lextreme:multi_eurlex_level_3
- helm|lextreme:online_terms_of_service_clause_topics
- helm|lextreme:online_terms_of_service_unfairness_levels
- helm|lextreme:swiss_judgment_prediction
- helm|lsat_qa
- helm|lsat_qa:assignment
- helm|lsat_qa:grouping
- helm|lsat_qa:miscellaneous
- helm|lsat_qa:ordering
- helm|me_q_sum
- helm|med_dialog:healthcaremagic
- helm|med_dialog:icliniq
- helm|med_mcqa
- helm|med_paragraph_simplification
- helm|med_qa
- helm|mmlu
- helm|mmlu:abstract_algebra
- helm|mmlu:anatomy
- helm|mmlu:astronomy
- helm|mmlu:business_ethics
- helm|mmlu:clinical_knowledge
- helm|mmlu:college_biology
- helm|mmlu:college_chemistry
- helm|mmlu:college_computer_science
- helm|mmlu:college_mathematics
- helm|mmlu:college_medicine
- helm|mmlu:college_physics
- helm|mmlu:computer_security
- helm|mmlu:conceptual_physics
- helm|mmlu:econometrics
- helm|mmlu:electrical_engineering
- helm|mmlu:elementary_mathematics
- helm|mmlu:formal_logic
- helm|mmlu:global_facts
- helm|mmlu:high_school_biology
- helm|mmlu:high_school_chemistry
- helm|mmlu:high_school_computer_science
- helm|mmlu:high_school_european_history
- helm|mmlu:high_school_geography
- helm|mmlu:high_school_government_and_politics
- helm|mmlu:high_school_macroeconomics
- helm|mmlu:high_school_mathematics
- helm|mmlu:high_school_microeconomics
- helm|mmlu:high_school_physics
- helm|mmlu:high_school_psychology
- helm|mmlu:high_school_statistics
- helm|mmlu:high_school_us_history
- helm|mmlu:high_school_world_history
- helm|mmlu:human_aging
- helm|mmlu:human_sexuality
- helm|mmlu:international_law
- helm|mmlu:jurisprudence
- helm|mmlu:logical_fallacies
- helm|mmlu:machine_learning
- helm|mmlu:management
- helm|mmlu:marketing
- helm|mmlu:medical_genetics
- helm|mmlu:miscellaneous
- helm|mmlu:moral_disputes
- helm|mmlu:moral_scenarios
- helm|mmlu:nutrition
- helm|mmlu:philosophy
- helm|mmlu:prehistory
- helm|mmlu:professional_accounting
- helm|mmlu:professional_law
- helm|mmlu:professional_medicine
- helm|mmlu:professional_psychology
- helm|mmlu:public_relations
- helm|mmlu:security_studies
- helm|mmlu:sociology
- helm|mmlu:us_foreign_policy
- helm|mmlu:virology
- helm|mmlu:world_religions
- helm|narrativeqa
- helm|numeracy:linear_example
- helm|numeracy:linear_standard
- helm|numeracy:parabola_example
- helm|numeracy:parabola_standard
- helm|numeracy:paraboloid_example
- helm|numeracy:paraboloid_standard
- helm|numeracy:plane_example
- helm|numeracy:plane_standard
- helm|openbookqa
- helm|piqa
- helm|pubmedqa
- helm|quac
- helm|raft:ade_corpus_v2
- helm|raft:banking_77
- helm|raft:neurips_impact_statement_risks
- helm|raft:one_stop_english
- helm|raft:overruling
- helm|raft:semiconductor_org_types
- helm|raft:systematic_review_inclusion
- helm|raft:tai_safety_research
- helm|raft:terms_of_service
- helm|raft:tweet_eval_hate
- helm|raft:twitter_complaints
- helm|real_toxicity_prompts
- helm|siqa
- helm|summarization:cnn-dm
- helm|summarization:xsum
- helm|summarization:xsum-sampled
- helm|synthetic_reasoning:induction
- helm|synthetic_reasoning:natural_easy
- helm|synthetic_reasoning:natural_hard
- helm|synthetic_reasoning:pattern_match
- helm|synthetic_reasoning:variable_substitution
- helm|the_pile:arxiv
- helm|the_pile:bibliotik
- helm|the_pile:commoncrawl
- helm|the_pile:dm-mathematics
- helm|the_pile:enron
- helm|the_pile:europarl
- helm|the_pile:freelaw
- helm|the_pile:github
- helm|the_pile:gutenberg
- helm|the_pile:hackernews
- helm|the_pile:nih-exporter
- helm|the_pile:opensubtitles
- helm|the_pile:openwebtext2
- helm|the_pile:pubmed-abstracts
- helm|the_pile:pubmed-central
- helm|the_pile:stackexchange
- helm|the_pile:upsto
- helm|the_pile:wikipedia
- helm|the_pile:youtubesubtitles
- helm|truthfulqa
- helm|twitterAAE:aa
- helm|twitterAAE:white
- helm|wikifact:applies_to_jurisdiction
- helm|wikifact:atomic_number
- helm|wikifact:author
- helm|wikifact:award_received
- helm|wikifact:basic_form_of_government
- helm|wikifact:capital
- helm|wikifact:capital_of
- helm|wikifact:central_bank
- helm|wikifact:composer
- helm|wikifact:continent
- helm|wikifact:country
- helm|wikifact:country_of_citizenship
- helm|wikifact:country_of_origin
- helm|wikifact:creator
- helm|wikifact:currency
- helm|wikifact:defendant
- helm|wikifact:developer
- helm|wikifact:diplomatic_relation
- helm|wikifact:director
- helm|wikifact:discoverer_or_inventor
- helm|wikifact:drug_or_therapy_used_for_treatment
- helm|wikifact:educated_at
- helm|wikifact:electron_configuration
- helm|wikifact:employer
- helm|wikifact:field_of_work
- helm|wikifact:file_extension
- helm|wikifact:genetic_association
- helm|wikifact:genre
- helm|wikifact:has_part
- helm|wikifact:head_of_government
- helm|wikifact:head_of_state
- helm|wikifact:headquarters_location
- helm|wikifact:industry
- helm|wikifact:influenced_by
- helm|wikifact:instance_of
- helm|wikifact:instrument
- helm|wikifact:language_of_work_or_name
- helm|wikifact:languages_spoken_written_or_signed
- helm|wikifact:laws_applied
- helm|wikifact:located_in_the_administrative_territorial_entity
- helm|wikifact:location
- helm|wikifact:location_of_discovery
- helm|wikifact:location_of_formation
- helm|wikifact:majority_opinion_by
- helm|wikifact:manufacturer
- helm|wikifact:measured_physical_quantity
- helm|wikifact:medical_condition_treated
- helm|wikifact:member_of
- helm|wikifact:member_of_political_party
- helm|wikifact:member_of_sports_team
- helm|wikifact:movement
- helm|wikifact:named_after
- helm|wikifact:native_language
- helm|wikifact:number_of_processor_cores
- helm|wikifact:occupation
- helm|wikifact:office_held_by_head_of_government
- helm|wikifact:office_held_by_head_of_state
- helm|wikifact:official_language
- helm|wikifact:operating_system
- helm|wikifact:original_language_of_film_or_TV_show
- helm|wikifact:original_network
- helm|wikifact:overrules
- helm|wikifact:owned_by
- helm|wikifact:part_of
- helm|wikifact:participating_team
- helm|wikifact:place_of_birth
- helm|wikifact:place_of_death
- helm|wikifact:plaintiff
- helm|wikifact:position_held
- helm|wikifact:position_played_on_team
- helm|wikifact:programming_language
- helm|wikifact:recommended_unit_of_measurement
- helm|wikifact:record_label
- helm|wikifact:religion
- helm|wikifact:repealed_by
- helm|wikifact:shares_border_with
- helm|wikifact:solved_by
- helm|wikifact:statement_describes
- helm|wikifact:stock_exchange
- helm|wikifact:subclass_of
- helm|wikifact:subsidiary
- helm|wikifact:symptoms_and_signs
- helm|wikifact:therapeutic_area
- helm|wikifact:time_of_discovery_or_invention
- helm|wikifact:twinned_administrative_body
- helm|wikifact:work_location
- helm|wikitext:103:document_level
- helm|wmt14:cs-en
- helm|wmt14:de-en
- helm|wmt14:fr-en
- helm|wmt14:hi-en
- helm|wmt14:ru-en
leaderboard:
- leaderboard|arc:challenge
- leaderboard|gsm8k
- leaderboard|hellaswag
- leaderboard|mmlu:abstract_algebra
- leaderboard|mmlu:anatomy
- leaderboard|mmlu:astronomy
- leaderboard|mmlu:business_ethics
- leaderboard|mmlu:clinical_knowledge
- leaderboard|mmlu:college_biology
- leaderboard|mmlu:college_chemistry
- leaderboard|mmlu:college_computer_science
- leaderboard|mmlu:college_mathematics
- leaderboard|mmlu:college_medicine
- leaderboard|mmlu:college_physics
- leaderboard|mmlu:computer_security
- leaderboard|mmlu:conceptual_physics
- leaderboard|mmlu:econometrics
- leaderboard|mmlu:electrical_engineering
- leaderboard|mmlu:elementary_mathematics
- leaderboard|mmlu:formal_logic
- leaderboard|mmlu:global_facts
- leaderboard|mmlu:high_school_biology
- leaderboard|mmlu:high_school_chemistry
- leaderboard|mmlu:high_school_computer_science
- leaderboard|mmlu:high_school_european_history
- leaderboard|mmlu:high_school_geography
- leaderboard|mmlu:high_school_government_and_politics
- leaderboard|mmlu:high_school_macroeconomics
- leaderboard|mmlu:high_school_mathematics
- leaderboard|mmlu:high_school_microeconomics
- leaderboard|mmlu:high_school_physics
- leaderboard|mmlu:high_school_psychology
- leaderboard|mmlu:high_school_statistics
- leaderboard|mmlu:high_school_us_history
- leaderboard|mmlu:high_school_world_history
- leaderboard|mmlu:human_aging
- leaderboard|mmlu:human_sexuality
- leaderboard|mmlu:international_law
- leaderboard|mmlu:jurisprudence
- leaderboard|mmlu:logical_fallacies
- leaderboard|mmlu:machine_learning
- leaderboard|mmlu:management
- leaderboard|mmlu:marketing
- leaderboard|mmlu:medical_genetics
- leaderboard|mmlu:miscellaneous
- leaderboard|mmlu:moral_disputes
- leaderboard|mmlu:moral_scenarios
- leaderboard|mmlu:nutrition
- leaderboard|mmlu:philosophy
- leaderboard|mmlu:prehistory
- leaderboard|mmlu:professional_accounting
- leaderboard|mmlu:professional_law
- leaderboard|mmlu:professional_medicine
- leaderboard|mmlu:professional_psychology
- leaderboard|mmlu:public_relations
- leaderboard|mmlu:security_studies
- leaderboard|mmlu:sociology
- leaderboard|mmlu:us_foreign_policy
- leaderboard|mmlu:virology
- leaderboard|mmlu:world_religions
- leaderboard|truthfulqa:mc
- leaderboard|winogrande
lighteval:
- lighteval|agieval:aqua-rat
- lighteval|agieval:gaokao-biology
- lighteval|agieval:gaokao-chemistry
- lighteval|agieval:gaokao-chinese
- lighteval|agieval:gaokao-english
- lighteval|agieval:gaokao-geography
- lighteval|agieval:gaokao-history
- lighteval|agieval:gaokao-mathqa
- lighteval|agieval:gaokao-physics
- lighteval|agieval:logiqa-en
- lighteval|agieval:logiqa-zh
- lighteval|agieval:lsat-ar
- lighteval|agieval:lsat-lr
- lighteval|agieval:lsat-rc
- lighteval|agieval:sat-en
- lighteval|agieval:sat-en-without-passage
- lighteval|agieval:sat-math
- lighteval|anli
- lighteval|anli:r1
- lighteval|anli:r2
- lighteval|anli:r3
- lighteval|arc:easy
- lighteval|arithmetic:1dc
- lighteval|arithmetic:2da
- lighteval|arithmetic:2dm
- lighteval|arithmetic:2ds
- lighteval|arithmetic:3da
- lighteval|arithmetic:3ds
- lighteval|arithmetic:4da
- lighteval|arithmetic:4ds
- lighteval|arithmetic:5da
- lighteval|arithmetic:5ds
- lighteval|asdiv
- lighteval|bigbench:causal_judgment
- lighteval|bigbench:date_understanding
- lighteval|bigbench:disambiguation_qa
- lighteval|bigbench:geometric_shapes
- lighteval|bigbench:logical_deduction_five_objects
- lighteval|bigbench:logical_deduction_seven_objects
- lighteval|bigbench:logical_deduction_three_objects
- lighteval|bigbench:movie_recommendation
- lighteval|bigbench:navigate
- lighteval|bigbench:reasoning_about_colored_objects
- lighteval|bigbench:ruin_names
- lighteval|bigbench:salient_translation_error_detection
- lighteval|bigbench:snarks
- lighteval|bigbench:sports_understanding
- lighteval|bigbench:temporal_sequences
- lighteval|bigbench:tracking_shuffled_objects_five_objects
- lighteval|bigbench:tracking_shuffled_objects_seven_objects
- lighteval|bigbench:tracking_shuffled_objects_three_objects
- lighteval|blimp:adjunct_island
- lighteval|blimp:anaphor_gender_agreement
- lighteval|blimp:anaphor_number_agreement
- lighteval|blimp:animate_subject_passive
- lighteval|blimp:animate_subject_trans
- lighteval|blimp:causative
- lighteval|blimp:complex_NP_island
- lighteval|blimp:coordinate_structure_constraint_complex_left_branch
- lighteval|blimp:coordinate_structure_constraint_object_extraction
- lighteval|blimp:determiner_noun_agreement_1
- lighteval|blimp:determiner_noun_agreement_2
- lighteval|blimp:determiner_noun_agreement_irregular_1
- lighteval|blimp:determiner_noun_agreement_irregular_2
- lighteval|blimp:determiner_noun_agreement_with_adj_2
- lighteval|blimp:determiner_noun_agreement_with_adj_irregular_1
- lighteval|blimp:determiner_noun_agreement_with_adj_irregular_2
- lighteval|blimp:determiner_noun_agreement_with_adjective_1
- lighteval|blimp:distractor_agreement_relational_noun
- lighteval|blimp:distractor_agreement_relative_clause
- lighteval|blimp:drop_argument
- lighteval|blimp:ellipsis_n_bar_1
- lighteval|blimp:ellipsis_n_bar_2
- lighteval|blimp:existential_there_object_raising
- lighteval|blimp:existential_there_quantifiers_1
- lighteval|blimp:existential_there_quantifiers_2
- lighteval|blimp:existential_there_subject_raising
- lighteval|blimp:expletive_it_object_raising
- lighteval|blimp:inchoative
- lighteval|blimp:intransitive
- lighteval|blimp:irregular_past_participle_adjectives
- lighteval|blimp:irregular_past_participle_verbs
- lighteval|blimp:irregular_plural_subject_verb_agreement_1
- lighteval|blimp:irregular_plural_subject_verb_agreement_2
- lighteval|blimp:left_branch_island_echo_question
- lighteval|blimp:left_branch_island_simple_question
- lighteval|blimp:matrix_question_npi_licensor_present
- lighteval|blimp:npi_present_1
- lighteval|blimp:npi_present_2
- lighteval|blimp:only_npi_licensor_present
- lighteval|blimp:only_npi_scope
- lighteval|blimp:passive_1
- lighteval|blimp:passive_2
- lighteval|blimp:principle_A_c_command
- lighteval|blimp:principle_A_case_1
- lighteval|blimp:principle_A_case_2
- lighteval|blimp:principle_A_domain_1
- lighteval|blimp:principle_A_domain_2
- lighteval|blimp:principle_A_domain_3
- lighteval|blimp:principle_A_reconstruction
- lighteval|blimp:regular_plural_subject_verb_agreement_1
- lighteval|blimp:regular_plural_subject_verb_agreement_2
- lighteval|blimp:sentential_negation_npi_licensor_present
- lighteval|blimp:sentential_negation_npi_scope
- lighteval|blimp:sentential_subject_island
- lighteval|blimp:superlative_quantifiers_1
- lighteval|blimp:superlative_quantifiers_2
- lighteval|blimp:tough_vs_raising_1
- lighteval|blimp:tough_vs_raising_2
- lighteval|blimp:transitive
- lighteval|blimp:wh_island
- lighteval|blimp:wh_questions_object_gap
- lighteval|blimp:wh_questions_subject_gap
- lighteval|blimp:wh_questions_subject_gap_long_distance
- lighteval|blimp:wh_vs_that_no_gap
- lighteval|blimp:wh_vs_that_no_gap_long_distance
- lighteval|blimp:wh_vs_that_with_gap
- lighteval|blimp:wh_vs_that_with_gap_long_distance
- lighteval|coqa
- lighteval|coqa_bb
- lighteval|drop
- lighteval|ethics:commonsense
- lighteval|ethics:deontology
- lighteval|ethics:justice
- lighteval|ethics:utilitarianism
- lighteval|ethics:virtue
- lighteval|glue:cola
- lighteval|glue:mnli
- lighteval|glue:mnli_mismatched
- lighteval|glue:mrpc
- lighteval|glue:qnli
- lighteval|glue:qqp
- lighteval|glue:rte
- lighteval|glue:sst2
- lighteval|glue:stsb
- lighteval|glue:wnli
- lighteval|gpqa
- lighteval|gsm8k
- lighteval|headqa:en
- lighteval|headqa:es
- lighteval|iwslt17:ar-en
- lighteval|iwslt17:de-en
- lighteval|iwslt17:en-ar
- lighteval|iwslt17:en-de
- lighteval|iwslt17:en-fr
- lighteval|iwslt17:en-ja
- lighteval|iwslt17:en-ko
- lighteval|iwslt17:en-zh
- lighteval|iwslt17:fr-en
- lighteval|iwslt17:ja-en
- lighteval|iwslt17:ko-en
- lighteval|iwslt17:zh-en
- lighteval|lambada:openai
- lighteval|lambada:openai:de
- lighteval|lambada:openai:en
- lighteval|lambada:openai:es
- lighteval|lambada:openai:fr
- lighteval|lambada:openai:it
- lighteval|lambada:openai_cloze
- lighteval|lambada:standard
- lighteval|lambada:standard_cloze
- lighteval|logiqa
- lighteval|math:algebra
- lighteval|math:counting_and_probability
- lighteval|math:geometry
- lighteval|math:intermediate_algebra
- lighteval|math:number_theory
- lighteval|math:prealgebra
- lighteval|math:precalculus
- lighteval|math_cot:algebra
- lighteval|math_cot:counting_and_probability
- lighteval|math_cot:geometry
- lighteval|math_cot:intermediate_algebra
- lighteval|math_cot:number_theory
- lighteval|math_cot:prealgebra
- lighteval|math_cot:precalculus
- lighteval|mathqa
- lighteval|mgsm:bn
- lighteval|mgsm:de
- lighteval|mgsm:en
- lighteval|mgsm:es
- lighteval|mgsm:fr
- lighteval|mgsm:ja
- lighteval|mgsm:ru
- lighteval|mgsm:sw
- lighteval|mgsm:te
- lighteval|mgsm:th
- lighteval|mgsm:zh
- lighteval|mtnt2019:en-fr
- lighteval|mtnt2019:en-ja
- lighteval|mtnt2019:fr-en
- lighteval|mtnt2019:ja-en
- lighteval|mutual
- lighteval|mutual_plus
- lighteval|openbookqa
- lighteval|piqa
- lighteval|prost
- lighteval|pubmedqa
- lighteval|qa4mre:2011
- lighteval|qa4mre:2012
- lighteval|qa4mre:2013
- lighteval|qasper
- lighteval|qasper_ll
- lighteval|race:high
- lighteval|sciq
- lighteval|storycloze:2016
- lighteval|storycloze:2018
- lighteval|super_glue:boolq
- lighteval|super_glue:cb
- lighteval|super_glue:copa
- lighteval|super_glue:multirc
- lighteval|super_glue:rte
- lighteval|super_glue:wic
- lighteval|super_glue:wsc
- lighteval|swag
- lighteval|the_pile:arxiv
- lighteval|the_pile:bookcorpus2
- lighteval|the_pile:books3
- lighteval|the_pile:dm-mathematics
- lighteval|the_pile:enron
- lighteval|the_pile:europarl
- lighteval|the_pile:freelaw
- lighteval|the_pile:github
- lighteval|the_pile:gutenberg
- lighteval|the_pile:hackernews
- lighteval|the_pile:nih-exporter
- lighteval|the_pile:opensubtitles
- lighteval|the_pile:openwebtext2
- lighteval|the_pile:philpapers
- lighteval|the_pile:pile-cc
- lighteval|the_pile:pubmed-abstracts
- lighteval|the_pile:pubmed-central
- lighteval|the_pile:stackexchange
- lighteval|the_pile:ubuntu-irc
- lighteval|the_pile:uspto
- lighteval|the_pile:wikipedia
- lighteval|the_pile:youtubesubtitles
- lighteval|toxigen
- lighteval|triviaqa
- lighteval|truthfulqa:gen
- lighteval|unscramble:anagrams1
- lighteval|unscramble:anagrams2
- lighteval|unscramble:cycle_letters
- lighteval|unscramble:random_insertion
- lighteval|unscramble:reversed_words
- lighteval|webqs
- lighteval|wikitext:2
- lighteval|wmt08:cs-en
- lighteval|wmt08:de-en
- lighteval|wmt08:en-cs
- lighteval|wmt08:en-de
- lighteval|wmt08:en-es
- lighteval|wmt08:en-fr
- lighteval|wmt08:en-hu
- lighteval|wmt08:es-en
- lighteval|wmt08:fr-en
- lighteval|wmt08:hu-en
- lighteval|wmt09:cs-en
- lighteval|wmt09:de-en
- lighteval|wmt09:en-cs
- lighteval|wmt09:en-de
- lighteval|wmt09:en-es
- lighteval|wmt09:en-fr
- lighteval|wmt09:en-hu
- lighteval|wmt09:en-it
- lighteval|wmt09:es-en
- lighteval|wmt09:fr-en
- lighteval|wmt09:hu-en
- lighteval|wmt09:it-en
- lighteval|wmt10:cs-en
- lighteval|wmt10:de-en
- lighteval|wmt10:en-cs
- lighteval|wmt10:en-de
- lighteval|wmt10:en-es
- lighteval|wmt10:en-fr
- lighteval|wmt10:es-en
- lighteval|wmt10:fr-en
- lighteval|wmt11:cs-en
- lighteval|wmt11:de-en
- lighteval|wmt11:en-cs
- lighteval|wmt11:en-de
- lighteval|wmt11:en-es
- lighteval|wmt11:en-fr
- lighteval|wmt11:es-en
- lighteval|wmt11:fr-en
- lighteval|wmt12:cs-en
- lighteval|wmt12:de-en
- lighteval|wmt12:en-cs
- lighteval|wmt12:en-de
- lighteval|wmt12:en-es
- lighteval|wmt12:en-fr
- lighteval|wmt12:es-en
- lighteval|wmt12:fr-en
- lighteval|wmt13:cs-en
- lighteval|wmt13:de-en
- lighteval|wmt13:en-cs
- lighteval|wmt13:en-de
- lighteval|wmt13:en-es
- lighteval|wmt13:en-fr
- lighteval|wmt13:en-ru
- lighteval|wmt13:es-en
- lighteval|wmt13:fr-en
- lighteval|wmt13:ru-en
- lighteval|wmt14:cs-en
- lighteval|wmt14:de-en
- lighteval|wmt14:en-cs
- lighteval|wmt14:en-de
- lighteval|wmt14:en-fr
- lighteval|wmt14:en-hi
- lighteval|wmt14:en-ru
- lighteval|wmt14:fr-en
- lighteval|wmt14:hi-en
- lighteval|wmt14:ru-en
- lighteval|wmt15:cs-en
- lighteval|wmt15:de-en
- lighteval|wmt15:en-cs
- lighteval|wmt15:en-de
- lighteval|wmt15:en-fi
- lighteval|wmt15:en-fr
- lighteval|wmt15:en-ru
- lighteval|wmt15:fi-en
- lighteval|wmt15:fr-en
- lighteval|wmt15:ru-en
- lighteval|wmt16:cs-en
- lighteval|wmt16:de-en
- lighteval|wmt16:en-cs
- lighteval|wmt16:en-de
- lighteval|wmt16:en-fi
- lighteval|wmt16:en-ro
- lighteval|wmt16:en-ru
- lighteval|wmt16:en-tr
- lighteval|wmt16:fi-en
- lighteval|wmt16:ro-en
- lighteval|wmt16:ru-en
- lighteval|wmt16:tr-en
- lighteval|wmt17:cs-en
- lighteval|wmt17:de-en
- lighteval|wmt17:en-cs
- lighteval|wmt17:en-de
- lighteval|wmt17:en-fi
- lighteval|wmt17:en-lv
- lighteval|wmt17:en-ru
- lighteval|wmt17:en-tr
- lighteval|wmt17:en-zh
- lighteval|wmt17:fi-en
- lighteval|wmt17:lv-en
- lighteval|wmt17:ru-en
- lighteval|wmt17:tr-en
- lighteval|wmt17:zh-en
- lighteval|wmt18:cs-en
- lighteval|wmt18:de-en
- lighteval|wmt18:en-cs
- lighteval|wmt18:en-de
- lighteval|wmt18:en-et
- lighteval|wmt18:en-fi
- lighteval|wmt18:en-ru
- lighteval|wmt18:en-tr
- lighteval|wmt18:en-zh
- lighteval|wmt18:et-en
- lighteval|wmt18:fi-en
- lighteval|wmt18:ru-en
- lighteval|wmt18:tr-en
- lighteval|wmt18:zh-en
- lighteval|wmt19:cs-de
- lighteval|wmt19:de-cs
- lighteval|wmt19:de-en
- lighteval|wmt19:de-fr
- lighteval|wmt19:en-cs
- lighteval|wmt19:en-de
- lighteval|wmt19:en-fi
- lighteval|wmt19:en-gu
- lighteval|wmt19:en-kk
- lighteval|wmt19:en-lt
- lighteval|wmt19:en-ru
- lighteval|wmt19:en-zh
- lighteval|wmt19:fi-en
- lighteval|wmt19:fr-de
- lighteval|wmt19:gu-en
- lighteval|wmt19:kk-en
- lighteval|wmt19:lt-en
- lighteval|wmt19:ru-en
- lighteval|wmt19:zh-en
- lighteval|wmt20:cs-en
- lighteval|wmt20:de-en
- lighteval|wmt20:de-fr
- lighteval|wmt20:en-cs
- lighteval|wmt20:en-de
- lighteval|wmt20:en-iu
- lighteval|wmt20:en-ja
- lighteval|wmt20:en-km
- lighteval|wmt20:en-pl
- lighteval|wmt20:en-ps
- lighteval|wmt20:en-ru
- lighteval|wmt20:en-ta
- lighteval|wmt20:en-zh
- lighteval|wmt20:fr-de
- lighteval|wmt20:iu-en
- lighteval|wmt20:ja-en
- lighteval|wmt20:km-en
- lighteval|wmt20:pl-en
- lighteval|wmt20:ps-en
- lighteval|wmt20:ru-en
- lighteval|wmt20:ta-en
- lighteval|wmt20:zh-en
- lighteval|wsc273
- lighteval|xcopa:en
- lighteval|xcopa:et
- lighteval|xcopa:ht
- lighteval|xcopa:id
- lighteval|xcopa:it
- lighteval|xcopa:qu
- lighteval|xcopa:sw
- lighteval|xcopa:ta
- lighteval|xcopa:th
- lighteval|xcopa:tr
- lighteval|xcopa:vi
- lighteval|xcopa:zh
- lighteval|xstory_cloze:ar
- lighteval|xstory_cloze:en
- lighteval|xstory_cloze:es
- lighteval|xstory_cloze:eu
- lighteval|xstory_cloze:hi
- lighteval|xstory_cloze:id
- lighteval|xstory_cloze:my
- lighteval|xstory_cloze:ru
- lighteval|xstory_cloze:sw
- lighteval|xstory_cloze:te
- lighteval|xstory_cloze:zh
- lighteval|xwinograd:en
- lighteval|xwinograd:fr
- lighteval|xwinograd:jp
- lighteval|xwinograd:pt
- lighteval|xwinograd:ru
- lighteval|xwinograd:zh
original:
- original|arc:c:letters
- original|arc:c:options
- original|arc:c:simple
- original|mmlu
- original|mmlu:abstract_algebra
- original|mmlu:anatomy
- original|mmlu:astronomy
- original|mmlu:business_ethics
- original|mmlu:clinical_knowledge
- original|mmlu:college_biology
- original|mmlu:college_chemistry
- original|mmlu:college_computer_science
- original|mmlu:college_mathematics
- original|mmlu:college_medicine
- original|mmlu:college_physics
- original|mmlu:computer_security
- original|mmlu:conceptual_physics
- original|mmlu:econometrics
- original|mmlu:electrical_engineering
- original|mmlu:elementary_mathematics
- original|mmlu:formal_logic
- original|mmlu:global_facts
- original|mmlu:high_school_biology
- original|mmlu:high_school_chemistry
- original|mmlu:high_school_computer_science
- original|mmlu:high_school_european_history
- original|mmlu:high_school_geography
- original|mmlu:high_school_government_and_politics
- original|mmlu:high_school_macroeconomics
- original|mmlu:high_school_mathematics
- original|mmlu:high_school_microeconomics
- original|mmlu:high_school_physics
- original|mmlu:high_school_psychology
- original|mmlu:high_school_statistics
- original|mmlu:high_school_us_history
- original|mmlu:high_school_world_history
- original|mmlu:human_aging
- original|mmlu:human_sexuality
- original|mmlu:international_law
- original|mmlu:jurisprudence
- original|mmlu:logical_fallacies
- original|mmlu:machine_learning
- original|mmlu:management
- original|mmlu:marketing
- original|mmlu:medical_genetics
- original|mmlu:miscellaneous
- original|mmlu:moral_disputes
- original|mmlu:moral_scenarios
- original|mmlu:nutrition
- original|mmlu:philosophy
- original|mmlu:prehistory
- original|mmlu:professional_accounting
- original|mmlu:professional_law
- original|mmlu:professional_medicine
- original|mmlu:professional_psychology
- original|mmlu:public_relations
- original|mmlu:security_studies
- original|mmlu:sociology
- original|mmlu:us_foreign_policy
- original|mmlu:virology
- original|mmlu:world_religions