Edit model card

Model Card for Model ID

These model aim to recognise occupation mentions (NER) in Spanish clinical notes and to whom the occupation belongs.

Model Details

PLM Model Learning
rate
Batch size Epochs Max
length
Optimizer Max clip
grad norm
Epsilon
PlanTL-GOB-ES/
roberta-base-biomedical-es
2e-05 8 10 510 AdamW 1 1e-08

Model Description

PlanTL-GOB-ES/roberta-base-biomedical-es model was fine-tuned using MEDDOPROF corpus (Salvador Lima-López, Eulàlia Farré-Maduell, Antonio Miranda-Escalada, Vicent Briva-Iglesias, & Martin Krallinger. (2022). MEDDOPROF corpus: complete gold standard annotations for occupation detection in medical documents in Spanish [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7116201)

Two models were built: A model for occupation recognition (MEDDO_FINAL_ROBERTA_ner_sentencia_510_8_10_2e-05_1e-08) and a model to detect to whom the profession belongs (MEDDO_FINAL_ROBERTA_class_sentencia_510_8_10_2e-05_1e-08).

More details about this can be found in MEDDOPROF shared task: Lima-López, S., Farré-Maduell, E., Miranda-Escalada, A., Brivá-Iglesias, V., & Krallinger, M. (2021). Nlp applied to occupational health: Meddoprof shared task at iberlef 2021 on automatic recognition, classification and normalization of professions and occupations from medical texts. Procesamiento del Lenguaje Natural, 67, 243-256.

  • Developed by: Alfredo Madrid
  • Language(s) (NLP): Spanish
  • License: CC BY-SA 4.0
  • Finetuned from model [optional]: PlanTL-GOB-ES/roberta-base-biomedical-es

Model Sources

Uses

Model 1

import torch
import pandas as pd
import numpy as np

from transformers import AutoTokenizer, AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained("MEDDO_FINAL_ROBERTA_ner_sentencia_510_8_10_2e-05_1e-08")
tokenizer = AutoTokenizer.from_pretrained("MEDDO_FINAL_ROBERTA_ner_sentencia_510_8_10_2e-05_1e-08")
note = "El paciente trabaja en una empresa de construccion los jueves"
tokenized_sentence = tokenizer.encode(note, truncation=True)
tokenized_words_ids = tokenizer(note, truncation=True)
word_ids = tokenized_words_ids.word_ids
input_ids = torch.tensor([tokenized_sentence])
with torch.no_grad():
    output = model(input_ids)
label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
tokens = tokenizer.convert_ids_to_tokens(input_ids.numpy()[0])
label_indices
df = pd.DataFrame(zip(tokens, label_indices[0], word_ids(0)), columns=["labels", "tokens", "relation"])
df['labels'] = df['labels'].str.replace('##', '')
df['tokens'] = df['tokens'].map({0: 'B-PROFESION', 1: 'B-SITUACION_LABORAL', 2: 'I-SITUACION_LABORAL', 3: 'I-ACTIVIDAD', 4: 'I-PROFESION', 5: 'O', 6: 'B-ACTIVIDAD', 7: 'PAD'})
df = df[1:-1]
df['relation'] = df['relation'].astype('int')
df['labels'] = df.groupby('relation')['labels'].transform(lambda x: ''.join(x))
df = df.groupby('relation').first()
df

Output

relation labels tokens
0 ĠEl O
1 Ġpaciente O
2 Ġtrabaja B-PROFESION
3 Ġen I-PROFESION
4 Ġuna I-PROFESION
5 Ġempresa I-PROFESION
6 Ġde I-PROFESION
7 Ġconstruccion I-PROFESION
8 Ġlos O
9 Ġjueves O

Model 2

import torch
import pandas as pd
import numpy as np

from transformers import AutoTokenizer, AutoModelForTokenClassification
model = AutoModelForTokenClassification.from_pretrained("MEDDO_FINAL_ROBERTA_class_sentencia_510_8_10_2e-05_1e-08")
tokenizer = AutoTokenizer.from_pretrained("MEDDO_FINAL_ROBERTA_class_sentencia_510_8_10_2e-05_1e-08")
note = "El paciente trabaja en una empresa de construccion los jueves"
tokenized_sentence = tokenizer.encode(note, truncation=True)
tokenized_words_ids = tokenizer(note, truncation=True)
word_ids = tokenized_words_ids.word_ids
input_ids = torch.tensor([tokenized_sentence])
with torch.no_grad():
    output = model(input_ids)
label_indices = np.argmax(output[0].to('cpu').numpy(), axis=2)
tokens = tokenizer.convert_ids_to_tokens(input_ids.to('cpu').numpy()[0])
label_indices
df = pd.DataFrame(zip(tokens, label_indices[0], word_ids(0)), columns=["labels", "tokens", "relation"])
df['labels'] = df['labels'].str.replace('##', '')
df['tokens'] = df['tokens'].map({0: 'B-FAMILIAR', 1: 'I-PACIENTE', 2: 'I-OTROS', 3: 'B-SANITARIO', 4: 'B-PACIENTE', 5: 'I-FAMILIAR', 6: 'O', 7: 'B-OTROS', 8: 'I-SANITARIO', 9: 'PAD'}
)
df = df[1:-1]
df['relation'] = df['relation'].astype('int')
df['labels'] = df.groupby('relation')['labels'].transform(lambda x: ''.join(x))
df = df.groupby('relation').first()
df

Output

relation labels tokens
0 ĠEl O
1 Ġpaciente O
2 Ġtrabaja B-PACIENTE
3 Ġen I-PACIENTE
4 Ġuna I-PACIENTE
5 Ġempresa I-PACIENTE
6 Ġde I-PACIENTE
7 Ġconstruccion I-PACIENTE
8 Ġlos O
9 Ġjueves O
Downloads last month
0
Inference Examples
Unable to determine this model's library. Check the docs .