miguel6nunes commited on
Commit
a5854b9
1 Parent(s): a139cb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -22
README.md CHANGED
@@ -33,43 +33,48 @@ widget:
33
  example_title: Example 5
34
  - text: Monitorização da Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
35
  example_title: Example 6
36
- - text: A ressonância magnética da utente revelou uma ruptura no menisco lateral do joelho.
37
  example_title: Example 7
38
  - text: A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
 
39
  ---
40
 
41
  # MediAlbertina
42
- The first publicly available medical language models trained with real European Portuguese data.
43
 
44
  MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
45
 
46
- Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m/blob/main/LICENSE).
47
 
48
 
49
 
50
  # Model Description
51
 
52
- MediAlbertina PT-PT 900M NER was created through fine-tuning of [MediAlbertina PT-PT 900M](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m) on real European Portuguese EMRs that have been hand-annotated for the following entities:
53
- - Diagnostico
54
- - Sintoma
55
- - Medicamento
56
- - Dosagem
57
- - ProcedimentoMedico
58
- - SinalVital
59
- - Resultado
60
- - Progresso
61
 
62
- MediAlbertina PT-PT 900M NER achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
 
 
 
 
 
 
 
 
 
63
 
64
- | Model | NER single-model | NER multi-models | Assertion Status |
65
- |-------------------------|:----------------:|:----------------:|:----------------:|
66
- | | F1-score | F1-score | F1-score |
67
- |albertina-900m-portuguese-ptpt-encoder | 0.813 | 0.811 | 0.687 |
68
- | **medialbertina_pt-pt_900m** | **0.832** | **0.848** | **0.755** |
69
 
70
  ## Data
71
 
72
- MediAlbertina PT-PT 900M NER was fine-tuned on more than 10k hand-annotated entities from more than a thousand fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).
73
 
74
 
75
  ## How to use
@@ -77,11 +82,11 @@ MediAlbertina PT-PT 900M NER was fine-tuned on more than 10k hand-annotated enti
77
  ```Python
78
  from transformers import pipeline
79
 
80
- ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER', aggregation_strategy='average')
81
  sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
82
  entities = ner_pipeline(sentence)
83
  for entity in entities:
84
- print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
85
  ```
86
 
87
  ## Citation
@@ -91,4 +96,4 @@ MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iu
91
  ```latex
92
  In publishing process. Reference will be added soon.
93
  ```
94
- Please use the above cannonical reference when using or citing this model.
 
33
  example_title: Example 5
34
  - text: Monitorização da Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
35
  example_title: Example 6
36
+ - text: A ressonância magnética da utente revelou uma rotura no menisco lateral do joelho.
37
  example_title: Example 7
38
  - text: A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
39
+ example_title: Example 8
40
  ---
41
 
42
  # MediAlbertina
43
+ The first publicly available medical language model trained with real European Portuguese data.
44
 
45
  MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.
46
 
47
+ Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m_NER_all/blob/main/LICENSE).
48
 
49
 
50
 
51
  # Model Description
52
 
53
+ **MediAlbertina PT-PT 900M NER all** was created through fine-tuning of [MediAlbertina PT-PT 900M](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m) on real European Portuguese EMRs that have been hand-annotated for the following entities:
54
+ - **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
55
+ - **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
56
+ - **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
57
+ - **Dosagem (D)**: Dosage and frequency of medication administration.
58
+ - **ProcedimentoMedico (PM)**: Anything healthcare professionals do related to patients, including exams, moving patients, administering something, or even surgeries.
59
+ - **SinalVital (SV)**: Quantifiable indicators in a patient that can be measured, always associated with a specific result. Examples include cholesterol levels, diuresis, weight, or glycaemia.
60
+ - **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
61
+ - **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability.
62
 
63
+ MediAlbertina PT-PT 900M NER all achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.
64
+
65
+ | Model | B-D | I-D | B-S | I-S | B-PM | I-PM | B-SV | I-SV | B-R | I-R | B-M | I-M | B-DO | I-DO | B-P | I-P |
66
+ |-------------------------|:----:|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
67
+ | | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 |
68
+ | albertina-900m-portuguese-ptpt-encoder|0.721|0.786|0.734|0.775|0.737|0.805|0.859|0.811|0.803|0.816|0.913|0.871|**0.853**|**0.895**|0.769|0.785|
69
+ | **medialbertina_pt-pt_900m** | **0.799**| **0.832**| **0.754**| **0.782**| **0.786**| **0.813**| **0.916**| **0.788**| **0.821**| **0.83**| **0.926**| **0.895**|0.85|0.885| **0.779**| **0.807**|
70
+
71
+
72
+
73
 
 
 
 
 
 
74
 
75
  ## Data
76
 
77
+ **MediAlbertina PT-PT 900M NER all** was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).
78
 
79
 
80
  ## How to use
 
82
  ```Python
83
  from transformers import pipeline
84
 
85
+ ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER_all', aggregation_strategy='average')
86
  sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
87
  entities = ner_pipeline(sentence)
88
  for entity in entities:
89
+ print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
90
  ```
91
 
92
  ## Citation
 
96
  ```latex
97
  In publishing process. Reference will be added soon.
98
  ```
99
+ Please use the above cannonical reference when using or citing this model.