open_pl_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

djstrong commited on Mar 20

Commit

9655a7c

•

1 Parent(s): 68bbe4a

add polqa tasks

Browse files

Files changed (1) hide show

src/about.py +24 -18

src/about.py CHANGED Viewed

@@ -32,7 +32,10 @@ class Tasks(Enum):
     task17 = Task("polish_cbd_regex", "f1,score-first", "cbd_g", "generate_until")
     task18 = Task("polish_klej_ner_multiple_choice", "acc,none", "klej_ner_mc", "multiple_choice")
     task19 = Task("polish_klej_ner_regex", "exact_match,score-first", "klej_ner_g", "generate_until")
-    task20 = Task("polish_poleval2018_task3_test_10k", "word_perplexity,none", "polish_poleval2018_task3_test_10k", "other")
 NUM_FEWSHOT = 0 # Change with your few shot
 # ---------------------------------------------------
@@ -81,25 +84,28 @@ or join our [Discord SpeakLeash](https://discord.gg/3G9DVM39)
 | Task                            | Dataset                               | Metric    | Type            |
 |---------------------------------|---------------------------------------|-----------|-----------------|
-| belebele_pol_Latn               | facebook/belebele                     | accuracy  | multiple_choice |
 | polemo2_in                      | allegro/klej-polemo2-in               | accuracy  | generate_until  |
-| polemo2_in_multiple_choice      | allegro/klej-polemo2-in               | accuracy  | multiple_choice |
 | polemo2_out                     | allegro/klej-polemo2-out              | accuracy  | generate_until  |
-| polemo2_out_multiple_choice     | allegro/klej-polemo2-out              | accuracy  | multiple_choice |
-| polish_8tags_multiple_choice    | sdadas/8tags                          | accuracy  | multiple_choice |
-| polish_8tags_regex              | sdadas/8tags                          | accuracy  | generate_until  |
-| polish_belebele_regex           | facebook/belebele                     | accuracy  | generate_until  |
-| polish_dyk_multiple_choice      | allegro/klej-dyk                      | binary F1 | multiple_choice |
-| polish_dyk_regex                | allegro/klej-dyk                      | binary F1 | generate_until  |
-| polish_ppc_multiple_choice      | sdadas/ppc                            | accuracy  | multiple_choice |
-| polish_ppc_regex                | sdadas/ppc                            | accuracy  | generate_until  |
-| polish_psc_multiple_choice      | allegro/klej-psc                      | binary F1 | multiple_choice |
-| polish_psc_regex                | allegro/klej-psc                      | binary F1 | generate_until  |
-| polish_cbd_multiple_choice      | ptaszynski/PolishCyberbullyingDataset | macro F1  | multiple_choice |
-| polish_cbd_regex                | ptaszynski/PolishCyberbullyingDataset | macro F1  | generate_until  |
-| polish_klej_ner_multiple_choice | allegro/klej-nkjp-ner                 | accuracy  | multiple_choice |
-| polish_klej_ner_regex           | allegro/klej-nkjp-ner                 | accuracy  | generate_until  |
-| polish_poleval2018_task3_test_10k | enelpol/poleval2018_task3_test_10k   | word perplexity | other |
 ## Reproducibility
 To reproduce our results, you need to clone the repository:

     task17 = Task("polish_cbd_regex", "f1,score-first", "cbd_g", "generate_until")
     task18 = Task("polish_klej_ner_multiple_choice", "acc,none", "klej_ner_mc", "multiple_choice")
     task19 = Task("polish_klej_ner_regex", "exact_match,score-first", "klej_ner_g", "generate_until")
+    task20 = Task("polish_poleval2018_task3_test_10k", "word_perplexity,none", "poleval2018_task3_test_10k", "other")
+    task21 = Task("polish_polqa_reranking_multiple_choice", "acc,none", "polqa_reranking_mc", "other") # multiple_choice
+    task22 = Task("polish_polqa_open_book", "levenshtein,none", "polqa_open_book_g", "other") # generate_until
+    task23 = Task("polish_polqa_closed_book", "levenshtein,none", "polqa_closed_book_g", "other") # generate_until
 NUM_FEWSHOT = 0 # Change with your few shot
 # ---------------------------------------------------
 | Task                            | Dataset                               | Metric    | Type            |
 |---------------------------------|---------------------------------------|-----------|-----------------|
 | polemo2_in                      | allegro/klej-polemo2-in               | accuracy  | generate_until  |
+| polemo2_in_mc      | allegro/klej-polemo2-in               | accuracy  | multiple_choice |
 | polemo2_out                     | allegro/klej-polemo2-out              | accuracy  | generate_until  |
+| polemo2_out_mc     | allegro/klej-polemo2-out              | accuracy  | multiple_choice |
+| 8tags_mc    | sdadas/8tags                          | accuracy  | multiple_choice |
+| 8tags_g              | sdadas/8tags                          | accuracy  | generate_until  |
+| belebele_mc           | facebook/belebele                     | accuracy  | multiple_choice  |
+| belebele_g           | facebook/belebele                     | accuracy  | generate_until  |
+| dyk_mc      | allegro/klej-dyk                      | binary F1 | multiple_choice |
+| dyk_g                | allegro/klej-dyk                      | binary F1 | generate_until  |
+| ppc_mc      | sdadas/ppc                            | accuracy  | multiple_choice |
+| ppc_g                | sdadas/ppc                            | accuracy  | generate_until  |
+| psc_mc      | allegro/klej-psc                      | binary F1 | multiple_choice |
+| psc_g                | allegro/klej-psc                      | binary F1 | generate_until  |
+| cbd_mc      | ptaszynski/PolishCyberbullyingDataset | macro F1  | multiple_choice |
+| cbd_g                | ptaszynski/PolishCyberbullyingDataset | macro F1  | generate_until  |
+| klej_ner_mc | allegro/klej-nkjp-ner                 | accuracy  | multiple_choice |
+| klej_ner_g           | allegro/klej-nkjp-ner                 | accuracy  | generate_until  |
+| poleval2018_task3_test_10k | enelpol/poleval2018_task3_test_10k   | word perplexity | other |
+| polqa_reranking_mc | ipipan/polqa   | accuracy | other |
+| polqa_open_book_g | ipipan/polqa   | levenshtein | other |
+| polqa_closed_book_g | ipipan/polqa   | levenshtein | other |
 ## Reproducibility
 To reproduce our results, you need to clone the repository: