DmitryPogrebnoy commited on
Commit
1686315
·
1 Parent(s): fd59e3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -0
README.md CHANGED
@@ -1,3 +1,58 @@
1
  ---
 
 
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - ru
4
  license: apache-2.0
5
  ---
6
+
7
+ # Model DmitryPogrebnoy/MedDistilBertBaseRuCased
8
+
9
+ # Model Description
10
+
11
+ This model is fine-tuned version of [DmitryPogrebnoy/distilbert-base-russian-cased](https://huggingface.co/DmitryPogrebnoy/distilbert-base-russian-cased).
12
+ The code for the fine-tuned process can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/spellchecker/ml_ranging/models/med_distilbert_base_russian_cased/fine_tune_distilbert_base_russian_cased.py).
13
+ The model is fine-tuned on a specially collected dataset of over 30,000 medical anamneses in Russian.
14
+ The collected dataset can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/data/anamnesis/processed/all_anamnesis.csv).
15
+
16
+ This model was created as part of a master's project to develop a method for correcting typos
17
+ in medical histories using BERT models as a ranking of candidates.
18
+ The project is open source and can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker).
19
+
20
+ # How to Get Started With the Model
21
+
22
+ You can use the model directly with a pipeline for masked language modeling:
23
+
24
+ ```python
25
+ >>> from transformers import pipeline
26
+ >>> pipeline = pipeline('fill-mask', model='DmitryPogrebnoy/MedDistilBertBaseRuCased')
27
+ >>> pipeline("У пациента [MASK] боль в грудине.")
28
+ [{'score': 0.1733243614435196,
29
+ 'token': 6880,
30
+ 'token_str': 'имеется',
31
+ 'sequence': 'У пациента имеется боль в грудине.'},
32
+ {'score': 0.08818087726831436,
33
+ 'token': 1433,
34
+ 'token_str': 'есть',
35
+ 'sequence': 'У пациента есть боль в грудине.'},
36
+ {'score': 0.03620537742972374,
37
+ 'token': 3793,
38
+ 'token_str': 'особенно',
39
+ 'sequence': 'У пациента особенно боль в грудине.'},
40
+ {'score': 0.03438418731093407,
41
+ 'token': 5168,
42
+ 'token_str': 'бол',
43
+ 'sequence': 'У пациента бол боль в грудине.'},
44
+ {'score': 0.032936397939920425,
45
+ 'token': 6281,
46
+ 'token_str': 'протекает',
47
+ 'sequence': 'У пациента протекает боль в грудине.'}]
48
+ ```
49
+
50
+ Or you can load the model and tokenizer and do what you need to do:
51
+
52
+ ```python
53
+ >>> from transformers import AutoTokenizer, AutoModelForMaskedLM
54
+ >>> tokenizer = AutoTokenizer.from_pretrained("DmitryPogrebnoy/MedDistilBertBaseRuCased")
55
+ >>> model = AutoModelForMaskedLM.from_pretrained("DmitryPogrebnoy/MedDistilBertBaseRuCased")
56
+ ```
57
+
58
+