File size: 2,863 Bytes
80735df
 
 
 
 
 
 
 
 
 
 
72ad87f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80735df
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
05dcef1
80735df
 
 
72ad87f
 
 
 
 
 
80735df
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
language: es
datasets:
- squad_es
- hackathon-pln-es/biomed_squad_es_v2
metrics:
- "f1"

---

# biomedtra-small for QA 

This model was trained as part of the "Extractive QA Biomedicine" project developed during the 2022 [Hackathon](https://somosnlp.org/hackathon) organized by SOMOS NLP.

## Motivation

Taking into account the existence of masked language models trained on Spanish Biomedical corpus, the objective of this project is to use them to generate extractice QA models for Biomedicine and compare their effectiveness with general masked language models.

The models trained during the [Hackathon](https://somosnlp.org/hackathon) were:

[hackathon-pln-es/roberta-base-bne-squad2-es](https://huggingface.co/hackathon-pln-es/roberta-base-bne-squad2-es)

[hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es](https://huggingface.co/hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es)

[hackathon-pln-es/roberta-base-biomedical-es-squad2-es](https://huggingface.co/hackathon-pln-es/roberta-base-biomedical-es-squad2-es)

[hackathon-pln-es/biomedtra-small-es-squad2-es](https://huggingface.co/hackathon-pln-es/biomedtra-small-es-squad2-es)

## Description

This model is a fine-tuned version of [mrm8488/biomedtra-small-es](https://huggingface.co/mrm8488/biomedtra-small-es) on the [squad_es (v2)](https://huggingface.co/datasets/squad_es) training dataset.


## Hyperparameters

The hyperparameters were chosen based on those used in [sultan/BioM-ELECTRA-Large-SQuAD2](https://huggingface.co/sultan/BioM-ELECTRA-Large-SQuAD2), an english-based model trained for similar purposes

```
 --num_train_epochs 5
 --learning_rate 5e-5
 --max_seq_length 512
 --doc_stride 128 
```

## Performance

Evaluated on the [hackathon-pln-es/biomed_squad_es_v2](https://huggingface.co/datasets/hackathon-pln-es/biomed_squad_es_v2) dev set.

The model was trained for 5 epochs, choosing the epoch with the best f1 score.

|Model                                                         |Base Model Domain|exact  |f1     |HasAns_exact|HasAns_f1|NoAns_exact|NoAns_f1|
|--------------------------------------------------------------|-----------------|-------|-------|------------|---------|-----------|--------|
|hackathon-pln-es/roberta-base-bne-squad2-es                   |General          |67.6341|75.6988|53.7367     |70.0526  |81.2174    |81.2174 |
|hackathon-pln-es/roberta-base-biomedical-clinical-es-squad2-es|Biomedical       |66.8426|75.2346|53.0249     |70.0031  |80.3478    |80.3478 |
|hackathon-pln-es/roberta-base-biomedical-es-squad2-es         |Biomedical       |67.6341|74.5612|47.6868     |61.7012  |87.1304    | 87.1304|
|hackathon-pln-es/biomedtra-small-es-squad2-es                 |Biomedical       |29.6394|36.317 |32.2064     |45.716   |27.1304    |27.1304 |

## Team
Santiago Maximo: [smaximo](https://huggingface.co/smaximo)