---
license: mit

inference:
  parameters:
    aggregation_strategy: "average"

language:
  - pt
pipeline_tag: token-classification
tags:
  - medialbertina-ptpt
  - deberta
  - portuguese
  - european portuguese
  - medical
  - clinical
  - healthcare
  - NER
  - Named Entity Recognition
  - IE
  - Information Extraction
widget:
  - text: Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os sinais vitais do utente, incluindo a pressão arterial, com leitura de 120/87 mmHg e a frequência cardíaca, de 80 batimentos por minuto, foram monitorizados. Após a cirurgia o utente apresentava  dor intensa no local e inchaço no tornozelo, mas os resultados da radiografia revelaram uma recuperação satisfatória. Foi prescrito ibuprofeno 600mg de 8 em 8 horas durante 3 dias.
    example_title: Example 1
  - text: Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.
    example_title: Example 2
  - text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
    example_title: Example 3
  - text: Após as sessões de fisioterapia o paciente apresenta recuperação de mobilidade.
    example_title: Example 4
  - text: O paciente está em Quimioterapia com uma dosagem específica de Cisplatina para o tratamento do cancro do pulmão.
    example_title: Example 5
  - text: Monitorização da  Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
    example_title: Example 6
  - text: A ressonância magnética da utente revelou uma rotura no menisco lateral do joelho.
    example_title: Example 7
  - text:  A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com imunomoduladores.
    example_title: Example 8
---

# MediAlbertina
The first publicly available medical language model trained with real European Portuguese data.

MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of [PORTULAN's Albertina](https://huggingface.co/PORTULAN) models with Electronic Medical Records shared by Portugal's largest public hospital.

Like its antecessors, MediAlbertina models are distributed under the [MIT license](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m_NER/blob/main/LICENSE).


# Model Description

**MediAlbertina PT-PT 900M NER** was created through fine-tuning of [MediAlbertina PT-PT 900M](https://huggingface.co/portugueseNLP/medialbertina_pt-pt_900m) on real European Portuguese EMRs that have been hand-annotated for the following entities:
- **Diagnostico (D)**: All types of diseases and conditions following the ICD-10-CM guidelines.
- **Sintoma (S)**: Any complaints or evidence from healthcare professionals indicating that a patient is experiencing a medical condition.
- **Medicamento (M)**: Something that is administrated to the patient (through any route), including drugs, specific food/drink, vitamins, or blood for transfusion.
- **Dosagem (D)**: Dosage and frequency of medication administration.
- **ProcedimentoMedico (PM)**: Anything healthcare professionals do related to patients, including exams, moving patients, administering something, or even surgeries.
- **SinalVital (SV)**: Quantifiable indicators in a patient that can be measured, always associated with a specific result. Examples include cholesterol levels, diuresis, weight, or glycaemia.
- **Resultado (R)**: Results can be associated with Medical Procedures and Vital Signs. It can be a numerical value if something was measured (e.g., the value associated with blood pressure) or a descriptor to indicate the result (e.g., positive/negative, functional).
- **Progresso (P)**: Describes the progress of patient’s condition. Typically, it includes verbs like improving, evolving, or regressing and mentions to patient’s stability. 
  
**MediAlbertina PT-PT 900M NER** achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.

| Model                   | B-D | I-D | B-S | I-S | B-PM | I-PM | B-SV | I-SV | B-R | I-R | B-M | I-M | B-DO | I-DO | B-P | I-P | 
|-------------------------|:----:|:----:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|                         | F1   | F1   | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  | F1  |
| albertina-900m-portuguese-ptpt-encoder|0.721|0.786|0.734|0.775|0.737|0.805|0.859|**0.811**|0.803|0.816|0.913|0.871|**0.853**|**0.895**|0.769|0.785|
| **medialbertina_pt-pt_900m** | **0.799**| **0.832**| **0.754**| **0.782**| **0.786**| **0.813**| **0.916**| 0.788| **0.821**| **0.83**| **0.926**| **0.895**|0.85|0.885| **0.779**| **0.807**|


## Data

**MediAlbertina PT-PT 900M NER** was fine-tuned on about 10k hand-annotated medical entities from about 4k fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the [FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence](https://ciencia.iscte-iul.pt/projects/aplicacoes-moveis-baseadas-em-inteligencia-artificial-para-resposta-de-saude-publica/1567).


## How to use

```Python
from transformers import pipeline

ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER', aggregation_strategy='average')
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
entities = ner_pipeline(sentence)
for entity in entities:
    print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")
```

## Citation

MediAlbertina is developed by a joint team from [ISCTE-IUL](https://www.iscte-iul.pt/), Portugal, and [Select Data](https://selectdata.com/), CA USA. For a fully detailed description, check the respective publication:

```latex
In publishing process. Reference will be added soon.
```
Please use the above cannonical reference when using or citing this model.

## Acknowledgements

This work was financially supported by Project Blockchain.PT – Decentralize Portugal with Blockchain Agenda, (Project no 51), WP2, Call no 02/C05-i01.01/2022, funded by the Portuguese Recovery and Resillience Program (PRR), The Portuguese Republic and The European Union (EU) under the framework of Next Generation EU Program.