Spaces:

m42-health
/

clinical_ner_leaderboard

Running

App Files Files Community

“WadoodAbdul” commited on Jul 25, 2024

Commit

48c47ed

1 Parent(s): 17c27e0

updated metrics comparision

Browse files

Files changed (1) hide show

src/about.py +5 -5

src/about.py CHANGED Viewed

@@ -63,9 +63,7 @@ LLM_BENCHMARKS_TEXT_1 = f"""
 The Named Clinical Entity Recognition Leaderboard is aimed at advancing the field of natural language processing in healthcare. It provides a standardized platform for evaluating and comparing the performance of various language models in recognizing named clinical entities, a critical task for applications such as clinical documentation, decision support, and information extraction. By fostering transparency and facilitating benchmarking, the leaderboard's goal is to drive innovation and improvement in NLP models. It also helps researchers identify the strengths and weaknesses of different approaches, ultimately contributing to the development of more accurate and reliable tools for clinical use. Despite its exploratory nature, the leaderboard aims to play a role in guiding research and ensuring that advancements are grounded in rigorous and comprehensive evaluations.
-## How it works
-### Evaluation method and metrics
 When training a Named Entity Recognition (NER) system, the most common evaluation methods involve measuring precision, recall, and F1-score at the token level. While these metrics are useful for fine-tuning the NER system, evaluating the predicted named entities for downstream tasks requires metrics at the full named-entity level. We include both evaluation methods: token-based and span-based. We provide an example below which helps in understanding the difference between the methods.
 Example Sentence: "The patient was diagnosed with a skin cancer disease."
 For simplicity, let's assume the an example sentence which contains 10 tokens, with a single two-token disease entity (as shown in the figure below).
@@ -111,9 +109,11 @@ $$ Precision = COR / (COR + INC + SPU)$$
 $$ Recall = COR / (COR + INC + MIS)$$
 $$ f1score = 2 * (Prec * Rec) / (Prec + Rec)$$
-This span-based approach is equivalent to the Partial Match ("Type") in the nervaluate (NER evaluation considering partial match scoring) python package.
-Further examples are presented the section below (Other example evaluations).
 ## Datasets
 The following datasets (test splits only) have been included in the evaluation.

 The Named Clinical Entity Recognition Leaderboard is aimed at advancing the field of natural language processing in healthcare. It provides a standardized platform for evaluating and comparing the performance of various language models in recognizing named clinical entities, a critical task for applications such as clinical documentation, decision support, and information extraction. By fostering transparency and facilitating benchmarking, the leaderboard's goal is to drive innovation and improvement in NLP models. It also helps researchers identify the strengths and weaknesses of different approaches, ultimately contributing to the development of more accurate and reliable tools for clinical use. Despite its exploratory nature, the leaderboard aims to play a role in guiding research and ensuring that advancements are grounded in rigorous and comprehensive evaluations.
+## Evaluation method and metrics
 When training a Named Entity Recognition (NER) system, the most common evaluation methods involve measuring precision, recall, and F1-score at the token level. While these metrics are useful for fine-tuning the NER system, evaluating the predicted named entities for downstream tasks requires metrics at the full named-entity level. We include both evaluation methods: token-based and span-based. We provide an example below which helps in understanding the difference between the methods.
 Example Sentence: "The patient was diagnosed with a skin cancer disease."
 For simplicity, let's assume the an example sentence which contains 10 tokens, with a single two-token disease entity (as shown in the figure below).
 $$ Recall = COR / (COR + INC + MIS)$$
 $$ f1score = 2 * (Prec * Rec) / (Prec + Rec)$$
+Note:
+1. Span-based approach here is equivalent to the 'Span Based Evaluation with Partial Overlap' in (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics] and is equivalent to Partial Match ("Type") in the nervaluate python package.
+2. Token-based approach here is equivalent to the 'Token Based Evaluation With Macro Average' in (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics]
+Additional examples can be tested on the (NER Metrics Showdown!)[https://huggingface.co/spaces/wadood/ner_evaluation_metrics] huggingface space.
 ## Datasets
 The following datasets (test splits only) have been included in the evaluation.