richardjonker2000
commited on
Commit
•
bd2bf08
1
Parent(s):
88880eb
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ metrics:
|
|
12 |
|
13 |
Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements. It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances. The classes it recognizes include symptoms, procedures, diseases, chemicals, and proteins.
|
14 |
|
15 |
-
We provide 4 different
|
16 |
|
17 |
## Model Details
|
18 |
|
@@ -51,14 +51,14 @@ Please refer to our GitHub repository for more information on how to train the m
|
|
51 |
|
52 |
The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
|
53 |
The dataset used consists of 4 seperate datasets:
|
54 |
-
- [
|
|
|
55 |
- [DisTEMIST](https://zenodo.org/records/7614764)
|
56 |
- [PharmaCoNER](https://zenodo.org/records/4270158)
|
57 |
-
- [SympTEMIST](https://zenodo.org/records/10635215)
|
58 |
|
59 |
### Speeds, Sizes, Times
|
60 |
|
61 |
-
The models were trained using an Nvidia
|
62 |
|
63 |
### Testing Data, Factors & Metrics
|
64 |
|
@@ -67,7 +67,7 @@ The testing data can be found on IEETA/SPACCC-Spanish-NER, which is further desc
|
|
67 |
|
68 |
#### Metrics
|
69 |
|
70 |
-
The models were evaluated using the F1
|
71 |
|
72 |
### Results
|
73 |
|
@@ -80,7 +80,7 @@ We provide 4 separate models with various hyperparameter changes:
|
|
80 |
| 3 | None | - | - | **78.89** |
|
81 |
| 1 | Random | 0.25 | 0.50 | **78.89** |
|
82 |
|
83 |
-
All models are trained with a context size of 32 for 60 epochs.
|
84 |
|
85 |
|
86 |
## Citation
|
|
|
12 |
|
13 |
Our model focuses on Biomedical Named Entity Recognition (NER) in Spanish clinical texts, crucial for automated information extraction in medical research and treatment improvements. It proposes a novel approach using a Multi-Head Conditional Random Field (CRF) classifier to tackle multi-class NER tasks, overcoming challenges of overlapping entity instances. The classes it recognizes include symptoms, procedures, diseases, chemicals, and proteins.
|
14 |
|
15 |
+
We provide 4 different models, available as branches of this repository.
|
16 |
|
17 |
## Model Details
|
18 |
|
|
|
51 |
|
52 |
The training data can be found on IEETA/SPACCC-Spanish-NER, which is further described on the dataset card.
|
53 |
The dataset used consists of 4 seperate datasets:
|
54 |
+
- [SympTEMIST](https://zenodo.org/records/10635215)
|
55 |
+
- [MedProcNER](https://zenodo.org/records/8224056)
|
56 |
- [DisTEMIST](https://zenodo.org/records/7614764)
|
57 |
- [PharmaCoNER](https://zenodo.org/records/4270158)
|
|
|
58 |
|
59 |
### Speeds, Sizes, Times
|
60 |
|
61 |
+
The models were trained using an Nvidia Quadro RTX 8000. The models for 5 classes took approximately 1 hour to train and occupy around 1GB of disk space. Additionally, this model shows linear complexity (+8 minutes) per entity class to classify.
|
62 |
|
63 |
### Testing Data, Factors & Metrics
|
64 |
|
|
|
67 |
|
68 |
#### Metrics
|
69 |
|
70 |
+
The models were evaluated using the micro-averaged F1-score metric, the standard for entity recognition tasks.
|
71 |
|
72 |
### Results
|
73 |
|
|
|
80 |
| 3 | None | - | - | **78.89** |
|
81 |
| 1 | Random | 0.25 | 0.50 | **78.89** |
|
82 |
|
83 |
+
All models are trained with a context size of 32 tokens for 60 epochs.
|
84 |
|
85 |
|
86 |
## Citation
|