Spanish
richardjonker2000 commited on
Commit
bd2bf08
1 Parent(s): 88880eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -12,7 +12,7 @@ metrics:
12
 
13
  Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements. It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances. The classes it recognizes include symptoms, procedures, diseases, chemicals, and proteins.
14
 
15
- We provide 4 different, models, available as branches of this repository.
16
 
17
  ## Model Details
18
 
@@ -51,14 +51,14 @@ Please refer to our GitHub repository for more information on how to train the m
51
 
52
  The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
53
  The dataset used consists of 4 seperate datasets:
54
- - [MedProcNer](https://zenodo.org/records/8224056)
 
55
  - [DisTEMIST](https://zenodo.org/records/7614764)
56
  - [PharmaCoNER](https://zenodo.org/records/4270158)
57
- - [SympTEMIST](https://zenodo.org/records/10635215)
58
 
59
  ### Speeds, Sizes, Times
60
 
61
- The models were trained using an Nvidia Quadra RTX 8000. The models for 5 classes took approximately 1 hour to train and occupy around 1GB of disk space. Additionally, this model shows linear complexity (+8 minutes) per entity class to classify.
62
 
63
  ### Testing Data, Factors & Metrics
64
 
@@ -67,7 +67,7 @@ The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further desc
67
 
68
  #### Metrics
69
 
70
- The models were evaluated using the F1 score metric, the standard for entity recognition tasks.
71
 
72
  ### Results
73
 
@@ -80,7 +80,7 @@ We provide 4 separate models with various hyperparameter changes:
80
  | 3 | None | - | - | **78.89** |
81
  | 1 | Random | 0.25 | 0.50 | **78.89** |
82
 
83
- All models are trained with a context size of 32 for 60 epochs.
84
 
85
 
86
  ## Citation
 
12
 
13
  Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements. It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances. The classes it recognizes include symptoms, procedures, diseases, chemicals, and proteins.
14
 
15
+ We provide 4 different models, available as branches of this repository.
16
 
17
  ## Model Details
18
 
 
51
 
52
  The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
53
  The dataset used consists of 4 seperate datasets:
54
+ - [SympTEMIST](https://zenodo.org/records/10635215)
55
+ - [MedProcNER](https://zenodo.org/records/8224056)
56
  - [DisTEMIST](https://zenodo.org/records/7614764)
57
  - [PharmaCoNER](https://zenodo.org/records/4270158)
 
58
 
59
  ### Speeds, Sizes, Times
60
 
61
+ The models were trained using an Nvidia Quadro RTX 8000. The models for 5 classes took approximately 1 hour to train and occupy around 1GB of disk space. Additionally, this model shows linear complexity (+8 minutes) per entity class to classify.
62
 
63
  ### Testing Data, Factors & Metrics
64
 
 
67
 
68
  #### Metrics
69
 
70
+ The models were evaluated using the micro-averaged F1-score metric, the standard for entity recognition tasks.
71
 
72
  ### Results
73
 
 
80
  | 3 | None | - | - | **78.89** |
81
  | 1 | Random | 0.25 | 0.50 | **78.89** |
82
 
83
+ All models are trained with a context size of 32 tokens for 60 epochs.
84
 
85
 
86
  ## Citation