“WadoodAbdul”
commited on
Commit
•
48c47ed
1
Parent(s):
17c27e0
updated metrics comparision
Browse files- src/about.py +5 -5
src/about.py
CHANGED
@@ -63,9 +63,7 @@ LLM_BENCHMARKS_TEXT_1 = f"""
|
|
63 |
|
64 |
The Named Clinical Entity Recognition Leaderboard is aimed at advancing the field of natural language processing in healthcare. It provides a standardized platform for evaluating and comparing the performance of various language models in recognizing named clinical entities, a critical task for applications such as clinical documentation, decision support, and information extraction. By fostering transparency and facilitating benchmarking, the leaderboard's goal is to drive innovation and improvement in NLP models. It also helps researchers identify the strengths and weaknesses of different approaches, ultimately contributing to the development of more accurate and reliable tools for clinical use. Despite its exploratory nature, the leaderboard aims to play a role in guiding research and ensuring that advancements are grounded in rigorous and comprehensive evaluations.
|
65 |
|
66 |
-
##
|
67 |
-
|
68 |
-
### Evaluation method and metrics
|
69 |
When training a Named Entity Recognition (NER) system, the most common evaluation methods involve measuring precision, recall, and F1-score at the token level. While these metrics are useful for fine-tuning the NER system, evaluating the predicted named entities for downstream tasks requires metrics at the full named-entity level. We include both evaluation methods: token-based and span-based. We provide an example below which helps in understanding the difference between the methods.
|
70 |
Example Sentence: "The patient was diagnosed with a skin cancer disease."
|
71 |
For simplicity, let's assume the an example sentence which contains 10 tokens, with a single two-token disease entity (as shown in the figure below).
|
@@ -111,9 +109,11 @@ $$ Precision = COR / (COR + INC + SPU)$$
|
|
111 |
$$ Recall = COR / (COR + INC + MIS)$$
|
112 |
$$ f1score = 2 * (Prec * Rec) / (Prec + Rec)$$
|
113 |
|
|
|
|
|
|
|
114 |
|
115 |
-
|
116 |
-
Further examples are presented the section below (Other example evaluations).
|
117 |
|
118 |
## Datasets
|
119 |
The following datasets (test splits only) have been included in the evaluation.
|
|
|
63 |
|
64 |
The Named Clinical Entity Recognition Leaderboard is aimed at advancing the field of natural language processing in healthcare. It provides a standardized platform for evaluating and comparing the performance of various language models in recognizing named clinical entities, a critical task for applications such as clinical documentation, decision support, and information extraction. By fostering transparency and facilitating benchmarking, the leaderboard's goal is to drive innovation and improvement in NLP models. It also helps researchers identify the strengths and weaknesses of different approaches, ultimately contributing to the development of more accurate and reliable tools for clinical use. Despite its exploratory nature, the leaderboard aims to play a role in guiding research and ensuring that advancements are grounded in rigorous and comprehensive evaluations.
|
65 |
|
66 |
+
## Evaluation method and metrics
|
|
|
|
|
67 |
When training a Named Entity Recognition (NER) system, the most common evaluation methods involve measuring precision, recall, and F1-score at the token level. While these metrics are useful for fine-tuning the NER system, evaluating the predicted named entities for downstream tasks requires metrics at the full named-entity level. We include both evaluation methods: token-based and span-based. We provide an example below which helps in understanding the difference between the methods.
|
68 |
Example Sentence: "The patient was diagnosed with a skin cancer disease."
|
69 |
For simplicity, let's assume the an example sentence which contains 10 tokens, with a single two-token disease entity (as shown in the figure below).
|
|
|
109 |
$$ Recall = COR / (COR + INC + MIS)$$
|
110 |
$$ f1score = 2 * (Prec * Rec) / (Prec + Rec)$$
|
111 |
|
112 |
+
Note:
|
113 |
+
1. Span-based approach here is equivalent to the 'Span Based Evaluation with Partial Overlap' in (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics] and is equivalent to Partial Match ("Type") in the nervaluate python package.
|
114 |
+
2. Token-based approach here is equivalent to the 'Token Based Evaluation With Macro Average' in (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics]
|
115 |
|
116 |
+
Additional examples can be tested on the (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics] huggingface space.
|
|
|
117 |
|
118 |
## Datasets
|
119 |
The following datasets (test splits only) have been included in the evaluation.
|