|
--- |
|
library_name: transformers |
|
tags: |
|
- medical |
|
language: |
|
- ru |
|
base_model: |
|
- Babelscape/wikineural-multilingual-ner |
|
datasets: |
|
- Mykes/patient_queries_ner |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/63565a3d58acee56a457f799/2M8J-5WABcDZe1TwY4HXk.png) |
|
|
|
# Model Card |
|
|
|
The model for NER recognition of medical requests |
|
|
|
### Model Description |
|
|
|
This model is finetuned on 4756 russian patient queries [Mykes/patient_queries_ner](https://huggingface.co/datasets/Mykes/patient_queries_ner) |
|
|
|
**The NER entities are**: |
|
- **B-SIM, I-SIM**: simptoms; |
|
- **B-SUBW, I-SUBW**: subway; |
|
- **GEN**: gender; |
|
- **CHILD**: child mention; |
|
- **B-SPEC, I-SPEC**: physician speciality; |
|
|
|
It's based on the [Babelscape/wikineural-multilingual-ner](https://huggingface.co/Babelscape/wikineural-multilingual-ner) 177M mBERT model. |
|
|
|
## Training info |
|
Training parameters: |
|
``` |
|
MAX_LEN = 256 |
|
TRAIN_BATCH_SIZE = 4 |
|
VALID_BATCH_SIZE = 2 |
|
EPOCHS = 5 |
|
LEARNING_RATE = 1e-05 |
|
MAX_GRAD_NORM = 10 |
|
``` |
|
The loss and accurancy on 5 EPOCH: |
|
``` |
|
Training loss epoch: 0.004890048759878736 |
|
Training accuracy epoch: 0.9896078955134066 |
|
``` |
|
The validations results: |
|
``` |
|
Validation Loss: 0.008194072216433625 |
|
Validation Accuracy: 0.9859073599112612 |
|
``` |
|
Detailed metrics (mostly f1-score): |
|
``` |
|
precision recall f1-score support |
|
|
|
EN 1.00 0.98 0.99 84 |
|
HILD 1.00 0.99 0.99 436 |
|
SIM 0.96 0.96 0.96 5355 |
|
SPEC 0.99 1.00 0.99 751 |
|
SUBW 0.99 1.00 0.99 327 |
|
|
|
micro avg 0.96 0.97 0.97 6953 |
|
macro avg 0.99 0.98 0.99 6953 |
|
weighted avg 0.96 0.97 0.97 6953 |
|
``` |
|
## Results: |
|
The model does not always identify words completely, but at the same time it detects individual pieces of words correctly even if the words are misspelled |
|
|
|
For example, the query "У меня треога и норушения сна. Подскажи хорошего психотервта в районе метро Октбрьской." returns the result: |
|
``` |
|
B-SIM I-SIM I-SIM B-SIM I-SIM I-SIM B-SPEC I-SPEC I-SPEC I-SPEC I-SPEC B-SUBW I-SUBW I-SUBW I-SUBW |
|
т ре ога но ру шения сна пс их о тер вта ок т брь ской |
|
``` |
|
As you can see it correctly detects event misspelled word: треога, норушения, психотервта |
|
|
|
## The simplest way to use the model with 🤗 transformers pipeline: |
|
``` |
|
pipe = pipeline(task="ner", model='Mykes/med_bert_ner', tokenizer='Mykes/med_bert_ner', aggregation_strategy="average") |
|
query = "У меня болит голова. Посоветуй невролога на проспекте мира" |
|
results = pipe(query.lower().strip('.,\n ')) |
|
|
|
# The output: |
|
# [{'entity_group': 'SIM', |
|
# 'score': 0.9920678, |
|
# 'word': 'болит голова', |
|
# 'start': 7, |
|
# 'end': 19}, |
|
# {'entity_group': 'SPEC', |
|
# 'score': 0.9985348, |
|
# 'word': 'невролога', |
|
# 'start': 31, |
|
# 'end': 40}, |
|
# {'entity_group': 'SUBW', |
|
# 'score': 0.68749845, |
|
# 'word': 'проспекте мира', |
|
# 'start': 44, |
|
# 'end': 58}] |
|
``` |