File size: 3,344 Bytes
08f950e 5e4a237 f19f5b1 43eaac3 08f950e 2212e9f cfb9ce4 08f950e 2212e9f 08f950e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
tags:
- generated_from_trainer
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: bert-base-spanish-wwm-uncased-finetuned-NER-medical
results: []
widget:
- text: "El útero o matriz es el lugar donde se desarrolla el bebé cuando una mujer está embarazada."
- text: "El síndrome de dolor regional complejo es un trastorno de dolor crónico."
---
# bert-base-spanish-wwm-uncased-finetuned-NER-medical
This model is a fine-tuned version of [dccuchile/bert-base-spanish-wwm-uncased](https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased) on an adaptation of [eHealth-KD Challenge 2020 dataset](https://knowledge-learning.github.io/ehealthkd-2020/), filtered only for the task of NER. The dataset annotations for NER are ['Concept', 'Action', 'Predicate', 'Reference'].
Before the training process, the dataset had processed to label it with the BIO annotation format. Some cleaning and adaptations were needed, for example, to work with overlapped entities.
It achieves the following results on the evaluation set:
- Loss: 0.6433
- Precision: 0.8297
- Recall: 0.8367
- F1: 0.8332
- Accuracy: 0.8876
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
The chapter [“Token classification”]( https://huggingface.co/course/chapter7/2) in the Hugging Face online course was used as starting point for the training process. We made some adaptions because our dataset follows a slightly different structure. Moreover, a conversion between string labels and integers labels was needed.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 12
### Training results
| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1 | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| 0.1139 | 1.0 | 50 | 0.3932 | 0.8671 | 0.8378 | 0.8522 | 0.9004 |
| 0.074 | 2.0 | 100 | 0.4334 | 0.8682 | 0.8367 | 0.8522 | 0.9004 |
| 0.0564 | 3.0 | 150 | 0.4498 | 0.8654 | 0.8353 | 0.8501 | 0.8993 |
| 0.0431 | 4.0 | 200 | 0.4683 | 0.8629 | 0.8425 | 0.8526 | 0.8985 |
| 0.0328 | 5.0 | 250 | 0.4850 | 0.8508 | 0.8454 | 0.8481 | 0.8964 |
| 0.027 | 6.0 | 300 | 0.4983 | 0.8608 | 0.8432 | 0.8519 | 0.8988 |
| 0.0253 | 7.0 | 350 | 0.5334 | 0.8618 | 0.8457 | 0.8537 | 0.9004 |
| 0.0242 | 8.0 | 400 | 0.5546 | 0.8636 | 0.8450 | 0.8542 | 0.9009 |
| 0.0233 | 9.0 | 450 | 0.5507 | 0.8543 | 0.8436 | 0.8489 | 0.8961 |
| 0.0203 | 10.0 | 500 | 0.5410 | 0.8605 | 0.8432 | 0.8518 | 0.9001 |
| 0.0179 | 11.0 | 550 | 0.5547 | 0.8603 | 0.8507 | 0.8555 | 0.9006 |
| 0.0149 | 12.0 | 600 | 0.5568 | 0.8616 | 0.8446 | 0.8531 | 0.9012 |
### Framework versions
- Transformers 4.17.0
- Pytorch 1.10.0+cu111
- Datasets 2.0.0
- Tokenizers 0.11.6
|