Spaces:
Runtime error
Runtime error
revert back to list as table can't be visualised
Browse files- src/display/about.py +19 -21
src/display/about.py
CHANGED
@@ -62,27 +62,25 @@ The total batch size we get for models which fit on one A100 node is 8 (8 GPUs *
|
|
62 |
|
63 |
The tasks and few shots parameters are:
|
64 |
|
65 |
-
|
66 |
-
|
67 |
-
|
68 |
-
|
69 |
-
|
70 |
-
|
71 |
-
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
|
82 |
-
|
83 |
-
|
84 |
-
| <a href="https://aclanthology.org/2022.tacl-1.84/" target="_blank"> FaithDial </a> (`faithdial_hallu`) | 8 | `acc` |
|
85 |
-
| <a href="https://aclanthology.org/D17-1082/" target="_blank"> RACE </a> (`race`) | 0 | `acc` |
|
86 |
|
87 |
For all these evaluations, a higher score is a better score.
|
88 |
|
|
|
62 |
|
63 |
The tasks and few shots parameters are:
|
64 |
|
65 |
+
- <a href="https://aclanthology.org/P19-1612/" target="_blank"> NQ Open </a> (`nq_open`): 64-shot (`exact_match`)
|
66 |
+
- <a href="https://aclanthology.org/P19-1612/" target="_blank"> NQ Open 8 </a> (`nq8`): 8-shot (`exact_match`)
|
67 |
+
- <a href="https://aclanthology.org/P17-1147/" target="_blank"> TriviaQA </a> (`triviaqa`): 64-shot (`exact_match`)
|
68 |
+
- <a href="https://aclanthology.org/P17-1147/" target="_blank"> TriviaQA 8 </a> (`tqa8`): 8-shot (`exact_match`)
|
69 |
+
- <a href="https://aclanthology.org/2022.acl-long.229/" target="_blank"> TruthfulQA MC1 </a> (`truthfulqa_mc1`): 0-shot (`acc`)
|
70 |
+
- <a href="https://aclanthology.org/2022.acl-long.229/" target="_blank"> TruthfulQA MC2 </a> (`truthfulqa_mc2`): 0-shot (`acc`)
|
71 |
+
- <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval QA </a> (`halueval_qa`): 0-shot (`em`)
|
72 |
+
- <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval Summ </a> (`halueval_summarization`): 0-shot (`em`)
|
73 |
+
- <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval Dial </a> (`halueval_dialogue`): 0-shot (`em`)
|
74 |
+
- <a href="https://aclanthology.org/2020.acl-main.173/" target="_blank"> XSum </a> (`xsum`): 2-shot (`rougeLsum`)
|
75 |
+
- <a href="https://arxiv.org/abs/1704.04368" target="_blank"> CNN/DM </a> (`cnndm`): 2-shot (`rougeLsum`)
|
76 |
+
- <a href="https://github.com/inverse-scaling/prize/tree/main" target="_blank"> MemoTrap </a> (`trap`): 0-shot (`acc`)
|
77 |
+
- <a href="https://arxiv.org/abs/2311.07911v1" target="_blank"> IFEval </a> (`ifeval`): 0-shot (`prompt_level_strict_acc`)
|
78 |
+
- <a href="https://arxiv.org/abs/2303.08896" target="_blank"> SelfCheckGPT </a> (`selfcheckgpt`): 0 (-)
|
79 |
+
- <a href="https://arxiv.org/abs/1803.05355" target="_blank"> FEVER </a> (`fever10`): 16-shot (`acc`)
|
80 |
+
- <a href="https://aclanthology.org/D16-1264/" target="_blank"> SQuADv2 </a> (`squadv2`): 4-shot (`squad_v2`)
|
81 |
+
- <a href="https://aclanthology.org/2023.findings-emnlp.68/" target="_blank"> TrueFalse </a> (`truefalse_cieacf`): 8-shot (`acc`)
|
82 |
+
- <a href="https://aclanthology.org/2022.tacl-1.84/" target="_blank"> FaithDial </a> (`faithdial_hallu`): 8-shot (`acc`)
|
83 |
+
- <a href="https://aclanthology.org/D17-1082/" target="_blank"> RACE </a> (`race`): 0-shot (`acc`)
|
|
|
|
|
84 |
|
85 |
For all these evaluations, a higher score is a better score.
|
86 |
|