open_pl_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

djstrong commited on Mar 3

Commit

11d1872

•

1 Parent(s): f108808

update metrics to f1

Browse files

Files changed (1) hide show

src/about.py +12 -13

src/about.py CHANGED Viewed

@@ -20,14 +20,14 @@ class Tasks(Enum):
     task7 = Task("polish_8tags_multiple_choice", "acc,none", "8tags_mc")
     task8 = Task("polish_8tags_regex", "exact_match,score-first", "8tags_g")
     task9 = Task("polish_belebele_regex", "exact_match,score-first", "belebele_g")
-    task10 = Task("polish_dyk_multiple_choice", "acc,none", "dyk_mc")
-    task11 = Task("polish_dyk_regex", "exact_match,score-first", "dyk_g")
     task12 = Task("polish_ppc_multiple_choice", "acc,none", "ppc_mc")
     task13 = Task("polish_ppc_regex", "exact_match,score-first", "ppc_g")
-    task14 = Task("polish_psc_multiple_choice", "acc,none", "psc_mc")
-    task15 = Task("polish_psc_regex", "exact_match,score-first", "psc_g")
-    task16 = Task("polish_cbd_multiple_choice", "acc,none", "cbd_mc")
-    task17 = Task("polish_cbd_regex", "exact_match,score-first", "cbd_g")
     task18 = Task("polish_klej_ner_multiple_choice", "acc,none", "klej_ner_mc")
     task19 = Task("polish_klej_ner_regex", "exact_match,score-first", "klej_ner_g")
@@ -66,7 +66,6 @@ or join our [Discord SpeakLeash](https://discord.gg/3G9DVM39)
 ## TODO
-* change metrics for DYK, PSC, CBD(?)
 * fix long model names
 * add inference time
 * add metadata for models (e.g. #Params)
@@ -83,14 +82,14 @@ or join our [Discord SpeakLeash](https://discord.gg/3G9DVM39)
 - **polish_8tags_multiple_choice**: accuracy
 - **polish_8tags_regex**: accuracy
 - **polish_belebele_regex**: accuracy
-- **polish_dyk_multiple_choice**: accuracy - should be F1
-- **polish_dyk_regex**: accuracy - should be F1
 - **polish_ppc_multiple_choice**: accuracy
 - **polish_ppc_regex**: accuracy
-- **polish_psc_multiple_choice**: accuracy - should be F1
-- **polish_psc_regex**: accuracy - should be F1
-- **polish_cbd_multiple_choice**: accuracy  - should be F1?
-- **polish_cbd_regex**: accuracy - should be F1?
 - **polish_klej_ner_multiple_choice**: accuracy
 - **polish_klej_ner_regex**: accuracy

     task7 = Task("polish_8tags_multiple_choice", "acc,none", "8tags_mc")
     task8 = Task("polish_8tags_regex", "exact_match,score-first", "8tags_g")
     task9 = Task("polish_belebele_regex", "exact_match,score-first", "belebele_g")
+    task10 = Task("polish_dyk_multiple_choice", "f1,none", "dyk_mc")
+    task11 = Task("polish_dyk_regex", "f1,score-first", "dyk_g")
     task12 = Task("polish_ppc_multiple_choice", "acc,none", "ppc_mc")
     task13 = Task("polish_ppc_regex", "exact_match,score-first", "ppc_g")
+    task14 = Task("polish_psc_multiple_choice", "f1,none", "psc_mc")
+    task15 = Task("polish_psc_regex", "f1,score-first", "psc_g")
+    task16 = Task("polish_cbd_multiple_choice", "f1,none", "cbd_mc")
+    task17 = Task("polish_cbd_regex", "f1,score-first", "cbd_g")
     task18 = Task("polish_klej_ner_multiple_choice", "acc,none", "klej_ner_mc")
     task19 = Task("polish_klej_ner_regex", "exact_match,score-first", "klej_ner_g")
 ## TODO
 * fix long model names
 * add inference time
 * add metadata for models (e.g. #Params)
 - **polish_8tags_multiple_choice**: accuracy
 - **polish_8tags_regex**: accuracy
 - **polish_belebele_regex**: accuracy
+- **polish_dyk_multiple_choice**: accuracy - binary F1
+- **polish_dyk_regex**: accuracy - binary F1
 - **polish_ppc_multiple_choice**: accuracy
 - **polish_ppc_regex**: accuracy
+- **polish_psc_multiple_choice**: accuracy - binary F1
+- **polish_psc_regex**: accuracy - binary F1
+- **polish_cbd_multiple_choice**: accuracy  - macro F1
+- **polish_cbd_regex**: accuracy - macro F1
 - **polish_klej_ner_multiple_choice**: accuracy
 - **polish_klej_ner_regex**: accuracy