File size: 2,438 Bytes
1943f92
 
3feaf03
 
 
 
 
1943f92
eada6dd
3feaf03
eada6dd
3feaf03
 
eada6dd
3feaf03
eada6dd
3feaf03
 
 
 
271a825
64f9b17
271a825
64f9b17
271a825
 
3feaf03
 
 
 
 
 
eada6dd
3feaf03
eada6dd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
---
license: cc-by-nc-4.0
language: "en"
tags:
- longformer
- clinical
- biomedical
---

<span style="font-size:larger;">**KEPTlongfomer**</span> is a medical knowledge enhanced version of Longformer that was further pre-trained using [contrastive learning](https://arxiv.org/pdf/2210.03304.pdf). 

### Pre-training
We initialized this model from RoBERTa-base-PM-M3-Voc-distill from Facebook [bio-lm](https://github.com/facebookresearch/bio-lm/).

And then pretrained with Hierarchical Self-Alignment Pretrain (HSAP) using Knowledge Graph UMLS.
This includes (a) Hierarchy, (b) Synonym, (c) Abbreviation. For more info, see section 3.3 in [paper](https://arxiv.org/pdf/2210.03304.pdf).
The learning rate was 5e-5, weight decay was 0.01, adam epsilon was 1e-5.

### Usage

Try the following sentence with Fill-Mask task on the right. The sentence masks token "cardiac".
```
74F with HTN, HLD, DM2, newly diagnosed atrial fibrillation in October who was transferred to hospital for <mask> catheterization after presentation there with syncopal episode.
```

Or load the model directly from Transformers:
```
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("whaleloops/KEPTlongformer-PMM3")
config = AutoConfig.from_pretrained("whaleloops/KEPTlongformer-PMM3")
model = AutoModelForMaskedLM.from_pretrained("whaleloops/KEPTlongformer-PMM3", config=config)
```

See our [github](https://github.com/whaleloops/KEPT/tree/rerank300) for how to use this with prompts on auto ICD coding.

With the following result:
| Metric  | Score |
| ------------- | ------------- |
|rec_micro| =0.5844294992252652|
|rec_macro| =0.12471916602840005|
|rec_at_8| =0.4138093882408751|
|rec_at_75| =0.8581874197033126|
|rec_at_50| =0.8109877644497351|
|rec_at_5| =0.2923155353947738|
|rec_at_15| =0.586890060777621|
|prec_micro| =0.6537291416981642|
|prec_macro| =0.1382069689951297|
|prec_at_8| =0.7835112692763938|
|prec_at_75| =0.20033214709371291|
|prec_at_50| =0.2810260972716489|
|prec_at_5| =0.8551008303677343|
|prec_at_15| =0.6288256227758008|
|f1_micro| =0.6171399726721254|
|f1_macro| =0.13111711325953157|
|f1_at_8| =0.54158310388029|
|f1_at_75| =0.324835806140454|
|f1_at_50| =0.4174099512237087|
|f1_at_5| =0.4356905906241822|
|f1_at_15| =0.6071345676658747|
|auc_micro| =0.9653561390964384|
|auc_macro| =0.8572490224880879|
|acc_micro| =0.4462779749767132|
|acc_macro| =0.09732882850157536|