Spaces:
Restarting
Restarting
removed unreadable pie chart
Browse files- app.py +3 -3
- src/about.py +1 -3
app.py
CHANGED
@@ -13,7 +13,7 @@ from src.about import (
|
|
13 |
LLM_BENCHMARKS_TEXT_1,
|
14 |
EVALUATION_EXAMPLE_IMG,
|
15 |
LLM_BENCHMARKS_TEXT_2,
|
16 |
-
ENTITY_DISTRIBUTION_IMG,
|
17 |
LLM_BENCHMARKS_TEXT_3,
|
18 |
TITLE,
|
19 |
LOGO
|
@@ -83,7 +83,7 @@ token_based_types_leaderboard_df = token_based_types_original_df.copy()
|
|
83 |
|
84 |
|
85 |
def update_df(evaluation_metric, shown_columns, subset="datasets"):
|
86 |
-
print(evaluation_metric)
|
87 |
|
88 |
if subset == "datasets":
|
89 |
match evaluation_metric:
|
@@ -506,7 +506,7 @@ with demo:
|
|
506 |
gr.Markdown(LLM_BENCHMARKS_TEXT_1, elem_classes="markdown-text")
|
507 |
gr.HTML(EVALUATION_EXAMPLE_IMG, elem_classes="logo")
|
508 |
gr.Markdown(LLM_BENCHMARKS_TEXT_2, elem_classes="markdown-text")
|
509 |
-
gr.HTML(ENTITY_DISTRIBUTION_IMG, elem_classes="logo")
|
510 |
gr.Markdown(LLM_BENCHMARKS_TEXT_3, elem_classes="markdown-text")
|
511 |
|
512 |
with gr.TabItem("π Submit here! ", elem_id="llm-benchmark-tab-table", id=3):
|
|
|
13 |
LLM_BENCHMARKS_TEXT_1,
|
14 |
EVALUATION_EXAMPLE_IMG,
|
15 |
LLM_BENCHMARKS_TEXT_2,
|
16 |
+
# ENTITY_DISTRIBUTION_IMG,
|
17 |
LLM_BENCHMARKS_TEXT_3,
|
18 |
TITLE,
|
19 |
LOGO
|
|
|
83 |
|
84 |
|
85 |
def update_df(evaluation_metric, shown_columns, subset="datasets"):
|
86 |
+
# print(evaluation_metric)
|
87 |
|
88 |
if subset == "datasets":
|
89 |
match evaluation_metric:
|
|
|
506 |
gr.Markdown(LLM_BENCHMARKS_TEXT_1, elem_classes="markdown-text")
|
507 |
gr.HTML(EVALUATION_EXAMPLE_IMG, elem_classes="logo")
|
508 |
gr.Markdown(LLM_BENCHMARKS_TEXT_2, elem_classes="markdown-text")
|
509 |
+
# gr.HTML(ENTITY_DISTRIBUTION_IMG, elem_classes="logo")
|
510 |
gr.Markdown(LLM_BENCHMARKS_TEXT_3, elem_classes="markdown-text")
|
511 |
|
512 |
with gr.TabItem("π Submit here! ", elem_id="llm-benchmark-tab-table", id=3):
|
src/about.py
CHANGED
@@ -184,8 +184,6 @@ The above datasets are modified to cater to the clinical setting. For this, the
|
|
184 |
| Gene | 1180 |
|
185 |
| Gene Variant | 241 |
|
186 |
|
187 |
-
|
188 |
-
The pie chart on the left below the distribution of clinical entities and their original dataset types.
|
189 |
"""
|
190 |
|
191 |
ENTITY_DISTRIBUTION_IMG = """<img src="file/assets/entity_distribution.png" alt="Clinical X HF" width="750" height="500">"""
|
@@ -214,7 +212,7 @@ He had been diagnosed with <span class="disease" >osteoarthritis of the knees</s
|
|
214 |
After the tagged output is generated, it is parsed to extract the tagged entities. The parsed data are then compared against the gold standard labels, and performance metrics are computed as above. This evaluation method ensures a consistent and objective assessment of decoder-only LLM's performance in NER tasks, despite the differences in their architecture compared to encoder models.
|
215 |
|
216 |
# Reproducibility
|
217 |
-
To reproduce our results, follow the steps detailed [here](https://github.com/WadoodAbdul/
|
218 |
|
219 |
# Disclaimer and Advisory
|
220 |
The Leaderboard is maintained by the authors and affiliated entity as part of our ongoing contribution to open research in the field of NLP in healthcare. The leaderboard is intended for academic and exploratory purposes only. The language models evaluated on this platform (to the best knowledge of the authors) have not been approved for clinical use, and their performance should not be interpreted as clinically validated or suitable for real-world medical applications.
|
|
|
184 |
| Gene | 1180 |
|
185 |
| Gene Variant | 241 |
|
186 |
|
|
|
|
|
187 |
"""
|
188 |
|
189 |
ENTITY_DISTRIBUTION_IMG = """<img src="file/assets/entity_distribution.png" alt="Clinical X HF" width="750" height="500">"""
|
|
|
212 |
After the tagged output is generated, it is parsed to extract the tagged entities. The parsed data are then compared against the gold standard labels, and performance metrics are computed as above. This evaluation method ensures a consistent and objective assessment of decoder-only LLM's performance in NER tasks, despite the differences in their architecture compared to encoder models.
|
213 |
|
214 |
# Reproducibility
|
215 |
+
To reproduce our results, follow the steps detailed [here](https://github.com/WadoodAbdul/clinical_ner_benchmark/blob/master/docs/reproducing_results.md)
|
216 |
|
217 |
# Disclaimer and Advisory
|
218 |
The Leaderboard is maintained by the authors and affiliated entity as part of our ongoing contribution to open research in the field of NLP in healthcare. The leaderboard is intended for academic and exploratory purposes only. The language models evaluated on this platform (to the best knowledge of the authors) have not been approved for clinical use, and their performance should not be interpreted as clinically validated or suitable for real-world medical applications.
|