DmitryPogrebnoy commited on
Commit
6bddd01
1 Parent(s): 428acc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -1,3 +1,66 @@
1
  ---
 
 
 
 
2
  license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+
4
+ - ru
5
+
6
  license: apache-2.0
7
+
8
  ---
9
+
10
+ # Model DmitryPogrebnoy/MedRuBertTiny2
11
+
12
+ # Model Description
13
+
14
+ This model is fine-tuned version
15
+ of [cointegrated/rubert-tiny2](https://huggingface.co/cointegrated/rubert-tiny2)
16
+ .
17
+ The code for the fine-tuned process can be
18
+ found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/spellchecker/ml_ranging/models/med_rubert_tiny2/fine_tune_rubert_tiny2.py)
19
+ .
20
+ The model is fine-tuned on a specially collected dataset of over 30,000 medical anamneses in Russian.
21
+ The collected dataset can be
22
+ found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker/blob/main/data/anamnesis/processed/all_anamnesis.csv).
23
+
24
+ This model was created as part of a master's project to develop a method for correcting typos
25
+ in medical histories using BERT models as a ranking of candidates.
26
+ The project is open source and can be found [here](https://github.com/DmitryPogrebnoy/MedSpellChecker).
27
+
28
+ # How to Get Started With the Model
29
+
30
+ You can use the model directly with a pipeline for masked language modeling:
31
+
32
+ ```python
33
+ >>> from transformers import pipeline
34
+ >>> pipeline = pipeline('fill-mask', model='DmitryPogrebnoy/MedRuBertTiny2')
35
+ >>> pipeline("У пациента [MASK] боль в грудине.")
36
+ [{'score': 0.4527082145214081,
37
+ 'token': 29626,
38
+ 'token_str': 'боль',
39
+ 'sequence': 'У пациента боль боль в грудине.'},
40
+ {'score': 0.05768931284546852,
41
+ 'token': 46275,
42
+ 'token_str': 'головной',
43
+ 'sequence': 'У пациента головной боль в грудине.'},
44
+ {'score': 0.02957102842628956,
45
+ 'token': 4674,
46
+ 'token_str': 'есть',
47
+ 'sequence': 'У пациента есть боль в грудине.'},
48
+ {'score': 0.02168550342321396,
49
+ 'token': 10030,
50
+ 'token_str': 'нет',
51
+ 'sequence': 'У пациента нет боль в грудине.'},
52
+ {'score': 0.02051634155213833,
53
+ 'token': 60730,
54
+ 'token_str': 'болит',
55
+ 'sequence': 'У пациента болит боль в грудине.'}]
56
+ ```
57
+
58
+ Or you can load the model and tokenizer and do what you need to do:
59
+
60
+ ```python
61
+ >>> from transformers import AutoTokenizer, AutoModelForMaskedLM
62
+ >>> tokenizer = AutoTokenizer.from_pretrained("DmitryPogrebnoy/MedRuBertTiny2")
63
+ >>> model = AutoModelForMaskedLM.from_pretrained("DmitryPogrebnoy/MedRuBertTiny2")
64
+ ```
65
+
66
+