Spaces:

m42-health
/

clinical_ner_leaderboard

Restarting

App Files Files Community

wadood commited on Sep 27

Commit

0c2f99e

•

1 Parent(s): 28687f6

removed unreadable pie chart

Browse files

Files changed (2) hide show

app.py +3 -3
src/about.py +1 -3

app.py CHANGED Viewed

@@ -13,7 +13,7 @@ from src.about import (
     LLM_BENCHMARKS_TEXT_1,
     EVALUATION_EXAMPLE_IMG,
     LLM_BENCHMARKS_TEXT_2,
-    ENTITY_DISTRIBUTION_IMG,
     LLM_BENCHMARKS_TEXT_3,
     TITLE,
     LOGO
@@ -83,7 +83,7 @@ token_based_types_leaderboard_df = token_based_types_original_df.copy()
 def update_df(evaluation_metric, shown_columns, subset="datasets"):
-    print(evaluation_metric)
     if subset == "datasets":
         match evaluation_metric:
@@ -506,7 +506,7 @@ with demo:
             gr.Markdown(LLM_BENCHMARKS_TEXT_1, elem_classes="markdown-text")
             gr.HTML(EVALUATION_EXAMPLE_IMG, elem_classes="logo")
             gr.Markdown(LLM_BENCHMARKS_TEXT_2, elem_classes="markdown-text")
-            gr.HTML(ENTITY_DISTRIBUTION_IMG, elem_classes="logo")
             gr.Markdown(LLM_BENCHMARKS_TEXT_3, elem_classes="markdown-text")
         with gr.TabItem("🚀 Submit here! ", elem_id="llm-benchmark-tab-table", id=3):

     LLM_BENCHMARKS_TEXT_1,
     EVALUATION_EXAMPLE_IMG,
     LLM_BENCHMARKS_TEXT_2,
+    # ENTITY_DISTRIBUTION_IMG,
     LLM_BENCHMARKS_TEXT_3,
     TITLE,
     LOGO
 def update_df(evaluation_metric, shown_columns, subset="datasets"):
+    # print(evaluation_metric)
     if subset == "datasets":
         match evaluation_metric:
             gr.Markdown(LLM_BENCHMARKS_TEXT_1, elem_classes="markdown-text")
             gr.HTML(EVALUATION_EXAMPLE_IMG, elem_classes="logo")
             gr.Markdown(LLM_BENCHMARKS_TEXT_2, elem_classes="markdown-text")
+            # gr.HTML(ENTITY_DISTRIBUTION_IMG, elem_classes="logo")
             gr.Markdown(LLM_BENCHMARKS_TEXT_3, elem_classes="markdown-text")
         with gr.TabItem("🚀 Submit here! ", elem_id="llm-benchmark-tab-table", id=3):

src/about.py CHANGED Viewed

@@ -184,8 +184,6 @@ The above datasets are modified to cater to the clinical setting. For this, the
 | Gene            | 1180                |
 | Gene Variant    | 241                 |
-The pie chart on the left below the distribution of clinical entities and their original dataset types.
 """
 ENTITY_DISTRIBUTION_IMG = """<img src="file/assets/entity_distribution.png" alt="Clinical X HF" width="750" height="500">"""
@@ -214,7 +212,7 @@ He had been diagnosed with <span class="disease" >osteoarthritis of the knees</s
 After the tagged output is generated, it is parsed to extract the tagged entities. The parsed data are then compared against the gold standard labels, and performance metrics are computed as above. This evaluation method ensures a consistent and objective assessment of decoder-only LLM's performance in NER tasks, despite the differences in their architecture compared to encoder models.
 # Reproducibility
-To reproduce our results, follow the steps detailed [here](https://github.com/WadoodAbdul/medics_ner/blob/master/docs/reproducing_results.md)
 # Disclaimer and Advisory
 The Leaderboard is maintained by the authors and affiliated entity as part of our ongoing contribution to open research in the field of NLP in healthcare. The leaderboard is intended for academic and exploratory purposes only. The language models evaluated on this platform (to the best knowledge of the authors) have not been approved for clinical use, and their performance should not be interpreted as clinically validated or suitable for real-world medical applications.

 | Gene            | 1180                |
 | Gene Variant    | 241                 |
 """
 ENTITY_DISTRIBUTION_IMG = """<img src="file/assets/entity_distribution.png" alt="Clinical X HF" width="750" height="500">"""
 After the tagged output is generated, it is parsed to extract the tagged entities. The parsed data are then compared against the gold standard labels, and performance metrics are computed as above. This evaluation method ensures a consistent and objective assessment of decoder-only LLM's performance in NER tasks, despite the differences in their architecture compared to encoder models.
 # Reproducibility
+To reproduce our results, follow the steps detailed [here](https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/master/docs/reproducing_results.md)
 # Disclaimer and Advisory
 The Leaderboard is maintained by the authors and affiliated entity as part of our ongoing contribution to open research in the field of NLP in healthcare. The leaderboard is intended for academic and exploratory purposes only. The language models evaluated on this platform (to the best knowledge of the authors) have not been approved for clinical use, and their performance should not be interpreted as clinically validated or suitable for real-world medical applications.