fmmolina's picture
Update README.md
2212e9f
|
raw
history blame
3.34 kB
metadata
tags:
  - generated_from_trainer
metrics:
  - precision
  - recall
  - f1
  - accuracy
model-index:
  - name: bert-base-spanish-wwm-uncased-finetuned-NER-medical
    results: []
widget:
  - text: >-
      El útero o matriz es el lugar donde se desarrolla el bebé cuando una mujer
      está embarazada.
  - text: El síndrome de dolor regional complejo es un trastorno de dolor crónico.

bert-base-spanish-wwm-uncased-finetuned-NER-medical

This model is a fine-tuned version of dccuchile/bert-base-spanish-wwm-uncased on an adaptation of eHealth-KD Challenge 2020 dataset, filtered only for the task of NER. The dataset annotations for NER are ['Concept', 'Action', 'Predicate', 'Reference'].

Before the training process, the dataset had processed to label it with the BIO annotation format. Some cleaning and adaptations were needed, for example, to work with overlapped entities.

It achieves the following results on the evaluation set:

  • Loss: 0.6433
  • Precision: 0.8297
  • Recall: 0.8367
  • F1: 0.8332
  • Accuracy: 0.8876

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

The chapter “Token classification” in the Hugging Face online course was used as starting point for the training process. We made some adaptions because our dataset follows a slightly different structure. Moreover, a conversion between string labels and integers labels was needed.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 12

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.1139 1.0 50 0.3932 0.8671 0.8378 0.8522 0.9004
0.074 2.0 100 0.4334 0.8682 0.8367 0.8522 0.9004
0.0564 3.0 150 0.4498 0.8654 0.8353 0.8501 0.8993
0.0431 4.0 200 0.4683 0.8629 0.8425 0.8526 0.8985
0.0328 5.0 250 0.4850 0.8508 0.8454 0.8481 0.8964
0.027 6.0 300 0.4983 0.8608 0.8432 0.8519 0.8988
0.0253 7.0 350 0.5334 0.8618 0.8457 0.8537 0.9004
0.0242 8.0 400 0.5546 0.8636 0.8450 0.8542 0.9009
0.0233 9.0 450 0.5507 0.8543 0.8436 0.8489 0.8961
0.0203 10.0 500 0.5410 0.8605 0.8432 0.8518 0.9001
0.0179 11.0 550 0.5547 0.8603 0.8507 0.8555 0.9006
0.0149 12.0 600 0.5568 0.8616 0.8446 0.8531 0.9012

Framework versions

  • Transformers 4.17.0
  • Pytorch 1.10.0+cu111
  • Datasets 2.0.0
  • Tokenizers 0.11.6