metadata
library_name: transformers
tags:
- medical
language:
- ru
base_model:
- Babelscape/wikineural-multilingual-ner
datasets:
- Mykes/patient_queries_ner
Model Card
The model for NER recognition of medical requests
Model Description
This model is finetuned on 4756 russian patient queries Mykes/patient_queries_ner
The NER entities are:
- B-SIM, I-SIM: simptoms;
- B-SUBW, I-SUBW: subway;
- GEN: gender;
- CHILD: child mention;
- B-SPEC, I-SPEC: physician speciality;
It's based on the Babelscape/wikineural-multilingual-ner 177M mBERT model.
Training info
Training parameters:
MAX_LEN = 256
TRAIN_BATCH_SIZE = 4
VALID_BATCH_SIZE = 2
EPOCHS = 5
LEARNING_RATE = 1e-05
MAX_GRAD_NORM = 10
The loss and accurancy on 5 EPOCH:
Training loss epoch: 0.004890048759878736
Training accuracy epoch: 0.9896078955134066
The validations results:
Validation Loss: 0.008194072216433625
Validation Accuracy: 0.9859073599112612
Detailed metrics (mostly f1-score):
precision recall f1-score support
EN 1.00 0.98 0.99 84
HILD 1.00 0.99 0.99 436
SIM 0.96 0.96 0.96 5355
SPEC 0.99 1.00 0.99 751
SUBW 0.99 1.00 0.99 327
micro avg 0.96 0.97 0.97 6953
macro avg 0.99 0.98 0.99 6953
weighted avg 0.96 0.97 0.97 6953
Results:
The model does not always identify words completely, but at the same time it detects individual pieces of words correctly even if the words are misspelled
For example, the query "У меня треога и норушения сна. Подскажи хорошего психотервта в районе метро Октбрьской." returns the result:
B-SIM I-SIM I-SIM B-SIM I-SIM I-SIM B-SPEC I-SPEC I-SPEC I-SPEC I-SPEC B-SUBW I-SUBW I-SUBW I-SUBW
т ре ога но ру шения сна пс их о тер вта ок т брь ской
As you can see it correctly detects event misspelled word: треога, норушения, психотервта
The simplest way to use the model with 🤗 transformers pipeline:
pipe = pipeline(task="ner", model='Mykes/med_bert_ner', tokenizer='Mykes/med_bert_ner', aggregation_strategy="average")
query = "У меня болит голова. Посоветуй невролога на проспекте мира"
results = pipe(query.lower().strip('.,\n '))
# The output:
# [{'entity_group': 'SIM',
# 'score': 0.9920678,
# 'word': 'болит голова',
# 'start': 7,
# 'end': 19},
# {'entity_group': 'SPEC',
# 'score': 0.9985348,
# 'word': 'невролога',
# 'start': 31,
# 'end': 40},
# {'entity_group': 'SUBW',
# 'score': 0.68749845,
# 'word': 'проспекте мира',
# 'start': 44,
# 'end': 58}]