joaobone's picture
Update README.md
fa3876b verified
|
raw
history blame
4.52 kB
metadata
license: mit
inference:
  parameters:
    aggregation_strategy: average
language:
  - pt
pipeline_tag: fill-mask
tags:
  - medialbertina-ptpt
  - deberta
  - portuguese
  - european portuguese
  - medical
  - clinical
  - healthcare
  - NER
  - Named Entity Recognition
  - IE
  - Information Extraction
widget:
  - text: >-
      Durante a cirurgia ortopédica para corrigir a fratura no tornozelo, os
      sinais vitais do utente, incluindo a pressão arterial, com leitura de
      120/87 mmHg, a frequência cardíaca, de 80 batimentos por minuto, e SpO2 a
      98%, foram monitorizados. Após a cirurgia o utente apresentava  dor
      intensa no local e inchaço no tornozelo, mas os resultados dos exames de
      radiografia revelaram uma recuperação satisfatória.
    example_title: Example 1
  - text: >-
      Durante o procedimento endoscópico, foram encontrados pólipos no cólon do
      paciente.
    example_title: Example 2
  - text: Foi recomendada aspirina de 500mg a cada 4 horas, durante 3 dias.
    example_title: Example 3
  - text: >-
      Após as sessões de fisioterapia o paciente apresenta recuperação de
      mobilidade.
    example_title: Example 4
  - text: >-
      O paciente está em Quimioterapia com uma dosagem específica de Cisplatina
      para o tratamento do cancro do pulmão.
    example_title: Example 5
  - text: Monitorização da  Freq. cardíaca com 90 bpm. P Arterial de 120-80 mmHg
    example_title: Example 6
  - text: >-
      A ressonância magnética da utente revelou uma ruptura no menisco lateral
      do joelho.
    example_title: Example 7
  - text: >-
      A paciente foi diagnosticada com esclerose múltipla e iniciou terapia com
      imunomoduladores.

MediAlbertina

The first publicly available medical language models trained with real European Portuguese data.

MediAlbertina is a family of encoders from the Bert family, DeBERTaV2-based, resulting from the continuation of the pre-training of PORTULAN's Albertina models with Electronic Medical Records shared by Portugal's largest public hospital.

Like its antecessors, MediAlbertina models are distributed under the MIT license.

Model Description

MediAlbertina PT-PT 900M NER was created through fine-tuning of MediAlbertina PT-PT 900M on real European Portuguese EMRs that have been hand-annotated for the following entities:

  • Diagnostico
  • Sintoma
  • Medicamento
  • Dosagem
  • ProcedimentoMedico
  • SinalVital
  • Resultado
  • Progresso

MediAlbertina PT-PT 900M NER achieved superior results to the same adaptation made on a non-medical Portuguese language model, demonstrating the effectiveness of this domain adaptation, and its potential for medical AI in Portugal.

Model NER single-model NER multi-models Assertion Status
F1-score F1-score F1-score
albertina-900m-portuguese-ptpt-encoder 0.813 0.811 0.687
medialbertina_pt-pt_900m 0.832 0.848 0.755

Data

MediAlbertina PT-PT 900M NER was fine-tuned on more than 10k hand-annotated entities from more than a thousand fully anonymized medical sentences from Portugal's largest public hospital. This data was acquired under the framework of the FCT project DSAIPA/AI/0122/2020 AIMHealth-Mobile Applications Based on Artificial Intelligence.

How to use

from transformers import pipeline

ner_pipeline = pipeline('ner', model='portugueseNLP/medialbertina_pt-pt_900m_NER', aggregation_strategy='average')
sentence = 'Durante o procedimento endoscópico, foram encontrados pólipos no cólon do paciente.'
entities = ner_pipeline(sentence)
for entity in entities:
  print(f"{entity['entity_group']} - {sentence[entity['start']:entity['end']]}")

Citation

MediAlbertina is developed by a joint team from ISCTE-IUL, Portugal, and Select Data, CA USA. For a fully detailed description, check the respective publication:

In publishing process. Reference will be added soon.

Please use the above cannonical reference when using or citing this model.