svassileva commited on
Commit
fff6568
1 Parent(s): 5d8403c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md CHANGED
@@ -1,3 +1,42 @@
1
  ---
2
  license: afl-3.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: afl-3.0
3
+ language:
4
+ - bg
5
  ---
6
+ # MBG-BlueBERT
7
+
8
+ A model based on BlueBERT and additionally pre-trained on Bulgarian medical and clinical texts.
9
+
10
+ BlueBERT - https://github.com/ncbi-nlp/bluebert
11
+
12
+ ## Model Details
13
+
14
+ * Model type: BERT-based model
15
+ * Languages(s): Bulgarian
16
+ * Domain: Clinical texts
17
+ * Description: Model based on BlueBERT and additionally pre-trained on Bulgarian medical and clinical texts.
18
+ * Resources for more information: [Github Repository](https://github.com/BorisVelichkov/icd10-dl-models-comparative-analysis), [Paper](https://aclanthology.org/2021.ranlp-1.162.pdf)
19
+
20
+ ## Cite as:
21
+
22
+ ```
23
+ @inproceedings{velichkov-etal-2021-comparative,
24
+ title = "Comparative Analysis of Fine-tuned Deep Learning Language Models for {ICD}-10 Classification Task for {B}ulgarian Language",
25
+ author = "Velichkov, Boris and
26
+ Vassileva, Sylvia and
27
+ Gerginov, Simeon and
28
+ Kraychev, Boris and
29
+ Ivanov, Ivaylo and
30
+ Ivanov, Philip and
31
+ Koychev, Ivan and
32
+ Boytcheva, Svetla",
33
+ booktitle = "Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)",
34
+ month = sep,
35
+ year = "2021",
36
+ address = "Held Online",
37
+ publisher = "INCOMA Ltd.",
38
+ url = "https://aclanthology.org/2021.ranlp-1.162",
39
+ pages = "1448--1454",
40
+ abstract = "The task of automatic diagnosis encoding into standard medical classifications and ontologies, is of great importance in medicine - both to support the daily tasks of physicians in the preparation and reporting of clinical documentation, and for automatic processing of clinical reports. In this paper we investigate the application and performance of different deep learning transformers for automatic encoding in ICD-10 of clinical texts in Bulgarian. The comparative analysis attempts to find which approach is more efficient to be used for fine-tuning of pretrained BERT family transformer to deal with a specific domain terminology on a rare language as Bulgarian. On the one side are used SlavicBERT and MultiligualBERT, that are pretrained for common vocabulary in Bulgarian, but lack medical terminology. On the other hand in the analysis are used BioBERT, ClinicalBERT, SapBERT, BlueBERT, that are pretrained for medical terminology in English, but lack training for language models in Bulgarian, and more over for vocabulary in Cyrillic. In our research study all BERT models are fine-tuned with additional medical texts in Bulgarian and then applied to the classification task for encoding medical diagnoses in Bulgarian into ICD-10 codes. Big corpora of diagnosis in Bulgarian annotated with ICD-10 codes is used for the classification task. Such an analysis gives a good idea of which of the models would be suitable for tasks of a similar type and domain. The experiments and evaluation results show that both approaches have comparable accuracy.",
41
+ }
42
+ ```