Edit model card

🤗 + 📚🩺🇮🇹 + 📖✍🏻🧑‍⚕️ = MedPsyNIT

From this repository you can download the MedPsyNIT (Medical Psychiatric Ner for ITalian) checkpoint.

MedPsyNIT is built on top of BioBIT, fine-tuned on a native Italian NER (Named Entity Recognition) dataset, composed by four Italian Hospitals. The class of entities in the dataset are:

  • Diagnosis and comorbidities (779 examples, 13.23% of the dataset)
  • Cognitive symptoms (2386 examples, 40.52% of the dataset)
  • Neuropsychiatric symptoms (707 examples, 12.01% of the dataset)
  • Drug treatment (162 examples, 2.75% of the dataset)
  • Medical assessment (1854 examples, 31.49% of the dataset)

We designed a set of experiments in order to mitigate annotation inconsistencies and to give the models the best possible generalization capabilities. The whole process highlighted a fundamental factor, namely that a multicenter model that can be used out-of-the-box is not effective and would likely provide low performance. However, a few hundred of high-quality, consistent examples, combined with a low-resource fine-tuning approach, can help to greatly enhance extraction quality. We believe that this evidence can be applied to other medical institutions and clinical settings, paving the way for the development of biomedical NER models in less-resourced languages. More details in the paper.

MedPsyNIT has been evaluated during the fine-tuning process splitting it into train (90%) and test (10%). The fine-tuning procedure has been repeated ten times for each model, initializing each run with a different random state, in order to minimize the effect of randomness and also to evaluate models’ stability. Here are the results, summarized

  • Diagnosis and comorbidities: 76.12%
  • Cognitive symptoms: 73.01%
  • Neuropsychiatric symptoms: 77.78%
  • Drug treatment: 89.18%
  • Medical assessment: 89.59%

Check the full paper for further details, and feel free to contact us if you have some inquiry!

Downloads last month
5
Safetensors
Model size
109M params
Tensor type
I64
·
F32
·

Dataset used to train IVN-RIN/MedPsyNIT