aaditya commited on
Commit
0202f10
1 Parent(s): f2d9565

Update src/about.py

Browse files
Files changed (1) hide show
  1. src/about.py +3 -0
src/about.py CHANGED
@@ -52,7 +52,10 @@ The datasets cover various aspects of medicine such as general medical knowledge
52
  The main evaluation metric used is Accuracy (ACC). Submit a model for automated evaluation on the "Submit" page. If you have comments or suggestions on additional medical datasets to include, please reach out to us in our discussion forum.
53
 
54
 
 
55
  The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
 
 
56
  """
57
 
58
  LLM_BENCHMARKS_TEXT = f"""
 
52
  The main evaluation metric used is Accuracy (ACC). Submit a model for automated evaluation on the "Submit" page. If you have comments or suggestions on additional medical datasets to include, please reach out to us in our discussion forum.
53
 
54
 
55
+
56
  The backend of the Open Medical LLM Leaderboard uses the Eleuther AI Language Model Evaluation Harness. More technical details can be found in the "About" page.
57
+ The <a href="https://arxiv.org/abs/2303.13375">GPT-4</a>, and <a href="https://arxiv.org/abs/2305.09617">Med-PaLM-2</a> results are taken from their official papers. Since Med-PaLM doesn't provide zero-shot accuracy, we are using 5-shot accuracy from their paper for comparison. All results presented are in the zero-shot setting, except for Med-PaLM-2 which use 5-shot accuracy.
58
+
59
  """
60
 
61
  LLM_BENCHMARKS_TEXT = f"""