metadata

language: es
datasets:
  - squad_es
  - hackathon-pln-es/biomed_squad_es_v2
metrics:
  - f1

biomedtra-small for QA

This model was trained as part of the "Extractive QA Biomedicine" project developed during the 2022 Hackathon organized by SOMOS NLP.

Motivation

Taking into account the existence of masked language models trained on Spanish Biomedical corpus, the objective of this project is to use them to generate extractice QA models for Biomedicine and compare their effectiveness with general masked language models.

The models trained during the Hackathon were:

hackathon-pln-es/roberta-base-bne-squad2-es

hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es

hackathon-pln-es/roberta-base-biomedical-es-squad2-es

hackathon-pln-es/biomedtra-small-es-squad2-es

Description

This model is a fine-tuned version of mrm8488/biomedtra-small-es on the squad_es (v2) training dataset.

Hyperparameters

The hyperparameters were chosen based on those used in sultan/BioM-ELECTRA-Large-SQuAD2, an english-based model trained for similar purposes

 --num_train_epochs 5
 --learning_rate 5e-5
 --max_seq_length 512
 --doc_stride 128

Performance

Evaluated on the hackathon-pln-es/biomed_squad_es_v2 dev set.

The model was trained for 5 epochs, choosing the epoch with the best f1 score.

Model	Base Model Domain	exact	f1	HasAns_exact	HasAns_f1	NoAns_exact	NoAns_f1
hackathon-pln-es/roberta-base-bne-squad2-es	General	67.6341	75.6988	53.7367	70.0526	81.2174	81.2174
hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es	Biomedical	66.8426	75.2346	53.0249	70.0031	80.3478	80.3478
hackathon-pln-es/roberta-base-biomedical-es-squad2-es	Biomedical	67.6341	74.5612	47.6868	61.7012	87.1304	87.1304
hackathon-pln-es/biomedtra-small-es-squad2-es	Biomedical	29.6394	36.317	32.2064	45.716	27.1304	27.1304

Team

Santiago Maximo: smaximo