miguel6nunes commited on
Commit
1d77fe7
1 Parent(s): 64e24b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -3
README.md CHANGED
@@ -1,3 +1,98 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+
4
+ language:
5
+ - pt
6
+
7
+ pipeline_tag: fill-mask
8
+
9
+ tags:
10
+ - medialbertina-ptpt
11
+ - deberta
12
+ - portuguese
13
+ - european portuguese
14
+ - medical
15
+ - clinical
16
+ - healthcare
17
+ - encoder
18
+
19
+ widget:
20
+ - text: "Febre e tosse são sintomas comuns de [MASK]"
21
+ example_title: "Example 1"
22
+ - text: "Diabetes [MASK] tipo II"
23
+ example_title: "Example 2"
24
+ - text: "Utente tolera dieta [MASK] / Nivel de glicémia bom."
25
+ example_title: "Example 3"
26
+ - text: "Doente com administração de [MASK] com tramal."
27
+ example_title: "Example 4"
28
+ - text: "Colocada sonda de gases por apresentar [MASK] timpanizado"
29
+ example_title: "Example 5"
30
+ - text: "Conectada em PRVC com necessidade de aumentar [MASK] para 70%"
31
+ example_title: "Example 6"
32
+ - text: "Medicado com [MASK] em dias alternados."
33
+ example_title: "Example 7"
34
+ - text: "Realizado teste de [MASK] ao paciente"
35
+ example_title: "Example 8"
36
+ - text: "Sintomas apontam para COVID [MASK]."
37
+ example_title: "Example 9"
38
+ - text: "Durante internamento fez [MASK] fresco congelado 3x dia"
39
+ example_title: "Example 10"
40
+ - text: "Pupilas iso [MASK]."
41
+ example_title: "Example 11"
42
+ - text: "Cardiopatia [MASK] - causa provável: HAS"
43
+ example_title: "Example 12"
44
+ - text: "O paciente encontra-se [MASK] estável."
45
+ example_title: "Example 13"
46
+ - text: "Traumatismo [MASK] após acidente de viação."
47
+ example_title: "Example 14"
48
+ - text: "Analgesia com morfina em perfusão (15 [MASK]/kg/h)"
49
+ example_title: "Example 15"
50
+ ---
51
+
52
+ # MediAlbertina
53
+ The first publicly available medical language models trained with real European Portuguese data.
54
+
55
+ MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
56
+
57
+ Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m/blob/main/LICENSE).
58
+
59
+
60
+
61
+ # Model Description
62
+
63
+ MediAlbertina PT-PT 1.5B was created through domain adaptation of [Albertina PT-PT 1.5B](https://huggingface.co/PORTULAN/albertina-1b5-portuguese-ptpt-encoder) on real European Portuguese EMRs by employing masked language modeling. It underwent evaluation through fine-tuning for the Information Extraction (IE) tasks Named Entity Recognition (NER) and Assertion Status (AStatus) on more than 10k manually annotated entities belonging to the following classes: Diagnosis, Symptom, Vital Sign, Result, Medical Procedure, Medication, Dosage, and Progress.
64
+ In both tasks, MediAlbertina achieved superior results to its antecessors, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
65
+
66
+ | Model | NER Single Model | NER Multi-Models (Diag+Symp) | NER Multi-Models (Med+Dos) | NER Multi-Models (MP+VS+R) | NER Multi-Models (Prog) | Assertion Status (Diag) | Assertion Status (Symp) | Assertion Status (Med) |
67
+ |-------------------------------|:----------------:|:-----------------------------:|:--------------------------:|:-------------------------:|:-----------------------:|:------------------:|:-----------------:|:-------------------:|
68
+ | | F1-score | F1-score | F1-score | F1-score | F1-score | F1-score | F1-score | F1-score |
69
+ | Albertina PT-PT 900M | 0.813 | 0.771 | 0.886 | 0.777 | 0.784 | 0.703 | 0.803 | 0.556 |
70
+ | Albertina PT-PT 1.5B | 0.838 | 0.801 | 0.924 | 0.836 | **0.877** | 0.772 | 0.881 | 0.862 |
71
+ | MediAlbertina PT-PT 900M | 0.832 | 0.801 | 0.916 | 0.810 | 0.864 | 0.722 | 0.823 | 0.723 |
72
+ | MediAlbertina PT-PT 1.5B | **0.843** | **0.813** | **0.926** | **0.851** | 0.858 | **0.789** | **0.886** | **0.868** |
73
+
74
+
75
+
76
+ ## Data
77
+
78
+ MediAlbertina PT-PT 1.5B was trained on more than 15M sentences and 300M tokens from 2.6M fully anonymized and unique Electronic Medical Records (EMRs) from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).
79
+
80
+
81
+ ## How to use
82
+
83
+ ```Python
84
+ from transformers import pipeline
85
+
86
+ unmasker = pipeline('fill-mask', model='portugueseNLP/medialbertina_pt-pt_1.5b')
87
+ unmasker("Analgesia com morfina em perfusão (15 [MASK]/kg/h)")
88
+ ```
89
+
90
+ ## Citation
91
+
92
+ MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iul.pt/), Portugal, and [Select Data](https://selectdata.com/), CA USA. For a fully detailed description, check the respective publication:
93
+
94
+ ```latex
95
+ In publishing process. Reference will be added soon.
96
+ ```
97
+ Please use the above cannonical reference when using or citing this model.
98
+